ScalingConfig with Ray Tune

EthanMarx · February 12, 2024, 2:30pm

Hello,

I am using TorchTrainer to wrap a pytorch Lightning training script to use with RayTune

I am wondering what the difference is between these two ScalingConfigs:

scaling_config = ScalingConfig(
    trainer_resources={"CPU": 0},
    resources_per_worker={"CPU": 8, "GPU": 2},
    num_workers=1,
    use_gpu=True,
)

and

scaling_config = ScalingConfig(
    trainer_resources={"CPU": 0},
    resources_per_worker={"CPU": 4, "GPU": 1},
    num_workers=2,
    use_gpu=True,
)

If my ray cluster, say, has 4 GPUs and 16 CPUs, will both of these configurations launch 2 concurrent trials, each trial utilizing 2 GPUs and 8 CPUs? Will both of these trials use lightning DDP under the hood? thank you!

Topic		Replies	Views
ScalingConfig() num_workers not corresponding to training runs? Ray Train	8	800	February 5, 2024
An example where setting ScalingConfig (trainer_resource= ) is useful?	1	423	May 19, 2023
Ray Train/Tune issue: concurrent trials conflict on GPU nodes Ray Tune	2	50	February 12, 2025
Multi-gpu ray tune for hparams not parallelizing and only using first gpu	0	81	July 10, 2024
GPU Scaling configuration for Tensorflow/Horovod/Pytorch Ray Tune	3	551	April 10, 2023

ScalingConfig with Ray Tune

Related topics