Hello,
I am using TorchTrainer
to wrap a pytorch Lightning training script to use with RayTune
I am wondering what the difference is between these two ScalingConfig
s:
scaling_config = ScalingConfig(
trainer_resources={"CPU": 0},
resources_per_worker={"CPU": 8, "GPU": 2},
num_workers=1,
use_gpu=True,
)
and
scaling_config = ScalingConfig(
trainer_resources={"CPU": 0},
resources_per_worker={"CPU": 4, "GPU": 1},
num_workers=2,
use_gpu=True,
)
If my ray cluster, say, has 4 GPUs and 16 CPUs, will both of these configurations launch 2 concurrent trials, each trial utilizing 2 GPUs and 8 CPUs? Will both of these trials use lightning DDP under the hood? thank you!