How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I have 4 GPUs and am trying to do distributed tuning where each tuning experiment uses 2 gpus for distributed training (i.e., 2 tuning experiments running in parallel, using 4 gpus total). However, when I run a simple cifar10 example, it seems to only ever use 2 out of 4 gpus. It seems to only be running one tuning experiment. I think I may be misunderstanding how to allocate num_workers and num_gpus_per_worker.
trainable_cls = DistributedTrainableCreator(
train_cifar,
num_workers=2,
num_gpus_per_worker=2,
num_cpus_per_worker=8)
analysis = tune.run(
trainable_cls,
config=config,
num_samples=4,
stop={"training_iteration": 10},
metric="accuracy",
mode="max")