Hello everyone,
I would like to know when should I use multi GPUs per worker for a ray training job by specifying the scaling_config to be something like
ScalingConfig(num_workers=2, use_gpu=True, resources_per_worker={"GPU":4})
So far I haven’t found any working examples using this setup yet, only seeing the document saying it’s possible to do so.
Also this raises a question in my mind, would the code be significantly different from the one with only one GPU per worker? Is there really any benefit using multi GPUs per worker for a ray training job?
For launching a distributed training job in ray with 8 GPUs, which scaling config is recommended, and when should multi gpu per worker scaling config should be used?
ScalingConfig(num_workers=2, use_gpu=True, resources_per_worker={"GPU":4})
ScalingConfig(num_workers=8, use_gpu=True, resources_per_worker={"GPU":1})
Take the pytorch fashion mnist code for example, how should I modify this in order to fully utilize the 8 GPUs with a multi gpu per worker scaling config like this? Is this a good practice?
ScalingConfig(num_workers=2, use_gpu=True, resources_per_worker={"GPU":4})