How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hey! I’m trying to train PPO on a computationally expensive environment that needs to run on GPU.
I’m running a Ray cluster across multiple (e.g., 8) nodes with 8 GPUs each using Slurm.
How can I get PPO to use the resources available efficiently?
On a single node of 8 GPUs it’s straightforward to split resources across the driver and workers with:
- num_workers (e.g., 35)
- num_gpus (e.g., 1)
- num_gpus_per_worker (e.g., 0.2)
But when trying to scale this to 8 nodes of 8 GPUs with the following:
- num_workers = 8 * 35 = 280
- num_gpus = 8 * 1 = 8
- num_gpus_per_worker = 0.2
the GPUs are not used by workers anymore.
Can we run PPO on multiple nodes? What is the right way to set this up? Is DDPPO the only option or can vanilla PPO work across multiple nodes?