Ray not scaling over multiple GPU in the same node

Abhay_Goyal · March 29, 2024, 4:24am

Hi,
I am currently working on using SAC for my work and have a GPU node with 4 GPUs in it. But, when I run my code (below), it does not seem to scale.

Could you please tell why?

ray.init(num_gpus=4)
config = SACConfig().training(gamma=0.9, lr=0.01, train_batch_size=64)

config = config.resources(
  num_gpus_per_learner_worker=1, 
  num_learner_workers=torch.cuda.device_count())

# config = config.resources(num_gpus=torch.cuda.device_count())
config = config.rollouts(num_rollout_workers=100)

# Build a Algorithm object from the config and run 1 training iteration.
algo = config.build(env= MineEnv200x150)
algo.train(num_workers=100, use_gpu=True)
# model = ray.train.torch.prepare_model(model)

Here, MineEnv200x150 is my custom environment for a single agent. The program executes but does not scale over multiple GPUs.

Topic		Replies	Views
When to use multi gpus per worker for a training job	1	172	September 15, 2024
Ray train parallelize on single GPU	4	1672	July 24, 2023
SAC on multi-GPU with Pytorch RLlib	0	404	July 8, 2021
RaySGD fails to find GPUs Ray Train	1	466	December 6, 2021
Scaling Ray Train in PyTorch with multiple GPUs per Worker: AttributeError Issue Ray Train	2	627	September 13, 2024

Ray not scaling over multiple GPU in the same node

Related topics