Ray train parallelize on single GPU

is it possible to parallelize on one GPU with multiple works with pytorch?
similar to this (but this doesnot work)

scaling_config = ScalingConfig(num_workers=5, use_gpu=True)
trainer = TorchTrainer(
    datasets={"train": train_dataset},

if i do this:
scaling_config = ScalingConfig(num_workers=5, use_gpu=True,resources_per_worker={"GPU":0.2})

i get this error: > torch.distributed.DistBackendError: NCCL error in: …/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3

ncclInternalError: Internal check failed.
Last error:
Duplicate GPU detected : rank 2 and rank 0 both on CUDA device 65000

NCCL does not work on a single GPU. You could try using gloo instead:

trainer = TorchTrainer(
    # ...,

That said, what are you trying to do here? The throughput should be higher if you just use the GPU in full.

i want to try Federated learning simulation on single GPU by paralleling client and server workflow on workers and allow exchange of model parameters

How does your training loop look like?