Total Workers == (Number of GPUS) - 1?

I have a system with 4 GPUs which can each hold a single (1) copy of the model being trained. When scaling up my RLlib experiment to use all 4 GPUs, I am only able to use 3 workers but all 4 GPUs are being utilized (noted by the GPU ram filling up). Requesting 4 workers results in ‘pending’ and the experiment does not start due to a lack of resources.

Does the RLlib trainer account for a worker, and even though I’m setting workers to equal 3, in reality there are 4 running? I’ll gladly follow-up with specifics if not.

Hey @Gregory , great question!

For “normal” algorithms:

  • Rollout Workers => only used for environment sampling (and policy inference to compute actions): 1 CPU per worker
    Set the number of workers via config.rollouts(num_rollout_workers=n).

  • GPUs:
    Only used on the central learner side, NOT for environment sampling or policy inference. Set this via config.resources(num_gpus=n).

Only our “DDPPO” algorithm can actually utilize GPUs on the Rollout Workers as it learns decentralized. For this algorithm, you should set config.resources(num_gpus_per_worker=1) to make sure each rollout worker can utilize one of your GPUs.

The total number of CPUs being used is always (make sure, your Ray cluster has that many CPUs available):
(num_rollout_workers * num_cpus_per_worker) + num_cpus_for_local_worker

To get a very accurate idea of which resources will be requested, you can also call your algorithm’s class’ default_resource_request method and pass in your config, like so:

from ray.rllib.algorithms.ppo import PPO, PPOConfig

config = PPOConfig().resources(num_gpus=2, num_gpus_per_worker=1, num_cpus_per_worker=2, num_cpus_for_local_worker=3)

resources_needed = PPO.default_resource_request(config)
print(resources_needed)

This will give you the individual “bundles” requested by Ray Tune for a single trial or by RLlib (in case you run an RLlib Algorithm directly, without Tune).
Note here that the first bundle is for the local worker, all the following bundles are for the individual remote rollout workers.

<PlacementGroupFactory (_bound=<BoundArguments (bundles=[
    {'CPU': 3.0, 'GPU': 2.0},  # local worker
    {'CPU': 2.0, 'GPU': 1.0},  # remote worker 1
    {'CPU': 2.0, 'GPU': 1.0},  # remote worker 2 (PPO by default has 2 rollout workers)
], strategy='PACK')>, head_bundle_is_empty=False)>