Ray train parallelize on single GPU

is it possible to parallelize on one GPU with multiple works with pytorch?
similar to this (but this doesnot work)

scaling_config = ScalingConfig(num_workers=5, use_gpu=True)
trainer = TorchTrainer(
    train_loop_per_worker=train_loop_per_worker,
    scaling_config=scaling_config,
    datasets={"train": train_dataset},
) 

if i do this:
scaling_config = ScalingConfig(num_workers=5, use_gpu=True,resources_per_worker={"GPU":0.2})

i get this error: > torch.distributed.DistBackendError: NCCL error in: …/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3

ncclInternalError: Internal check failed.
Last error:
Duplicate GPU detected : rank 2 and rank 0 both on CUDA device 65000

NCCL does not work on a single GPU. You could try using gloo instead:

trainer = TorchTrainer(
    # ...,
    torch_config=TorchConfig(backend="gloo"),
)

That said, what are you trying to do here? The throughput should be higher if you just use the GPU in full.

@kai
i want to try Federated learning simulation on single GPU by paralleling client and server workflow on workers and allow exchange of model parameters

How does your training loop look like?