Using my GPU for both rollouts and learning with PPO?

MCW_Lad · February 18, 2026, 5:01am

I have a research machine with one GPU and 60 CPUs, and I’m training an RL agent on a computationally-expensive environment using PPO.

Seeing as PPO runs rollouts and then weight updates sequentially rather than in parallel, I’d rather the GPU not spend half the time sitting idle. Accordingly, I’d like to use my GPU for both inference and training. How would I go about designing my config such that this takes place?

RunLLM · February 18, 2026, 5:02am

To use your single GPU for both inference (rollouts) and training in RLlib PPO, set num_env_runners=0 so that rollouts and training both occur on the main process, which can use the GPU for both. This is recommended when you have only one GPU and want to maximize its utilization, especially for compute-intensive environments. Set num_gpus: 1 in your config to ensure the main process uses the GPU. Do not allocate GPUs to rollout workers (num_gpus_per_env_runner), as this would require multiple GPUs. For efficient use, keep a large number of environments per worker if possible, but with only one GPU, all inference and training will share it sequentially on the main process.

Example config:

from ray.rllib.algorithms.ppo import PPOConfig

config = (
    PPOConfig()
    .env("YourEnv")
    .env_runners(num_env_runners=0)  # Use main process for rollouts
    .resources(num_gpus=1)           # Use GPU for main process
    # Optionally, tune other parameters as needed
)

This setup ensures both inference and training use the GPU, avoiding idle time between phases. For more details, see RLlib Training Guide.

Would you like a deeper explanation or more advanced configuration options?

Sources:

Hint: Mention @RunLLM in the post for followups.

MCW_Lad · February 18, 2026, 5:39am

I don’t think the LLM’s answer is correct.

This doc file has something that looks promising:

For synchronous algorithms like PPO and A2C, the driver and workers can make use of the same GPU. To do this for an amount of n GPUS:

gpu_count = n
num_gpus = 0.0001 # Driver GPU
num_gpus_per_env_runner = (gpu_count - num_gpus) / num_env_runners

That said, I haven’t been able to get this to work in practice:

.learners()
.env_runners(
        num_gpus_per_env_runner=(0.9999 / 10),
        num_env_runners=10,
)

seems to learn a lot slower than:

.learners(
        num_gpus_per_learner=1.0,
    )
.env_runners(
        num_gpus_per_env_runner=0.0,
        num_env_runners=10,
)

PhilippWillms · February 18, 2026, 11:13pm

This sounds familiar with my challenges.

After trying around a bit and finding this issue [Core, RLlib] Multi GPU RLlib experiment is unable to be scheduled. · Issue #35409 · ray-project/ray · GitHub , I finally found the insights I was looking for.

# The num_learners setting determines how many remote Learner workers there are in your Algorithm’s LearnerGroup. If you set this to 0, your LearnerGroup only contains a local Learner that runs on the main process (and shares the compute resources with that process, usually 1 CPU). Cannot set both `num_cpus_per_learner` > 1 and  `num_gpus_per_learner` > 0!

# Either set `num_cpus_per_learner` > 1 (and `num_gpus_per_learner`=0) OR
# set `num_gpus_per_learner` > 0 (and leave `num_cpus_per_learner` at its default value of 1). This is due to issues with placement group fragmentation.

        .resources(num_gpus=args.num_gpus)

        .learners(num_gpus_per_learner=split_gpu_trainer)

        .env_runners(

            num_env_runners=args.num_env_runners, 

            num_gpus_per_env_runner=num_gpu_per_worker,

            num_cpus_per_env_runner=num_cpu_per_worker,

        )

Based on my experience, I recommend you to have your own function which calculates the resources based on following factors:

The number of trials you want to run in parallel / concurrent
The number of env runners or workers (old stack) per trial
The number of CPU and GPU available in total

In the example above, the output of this own function are the variables split_gpu_trainer, num_gpu_per_worker and num_cpu_per_worker .

MCW_Lad · February 19, 2026, 11:24am

@PhilippWillms Interesting information; it’s always neat to see how the systems side of Ray/RLlib works. I ran some more tests locally, monitoring resource usage, and I think I cracked the issue. My test notes are below (repo for replication if desired):

# My local machine has 16 CPUs and 1 GPU

python run_training.py --env-name SW_1v1_env_singleplayer --env-config '{"speed": 5.0, "ep_length": 4096, "egocentric": true, "size_multiplier": 1.0, "grav_multiplier": 1.0, "target_speed": 1.0, "target_ammo": 0.0, "elliptical": true, "probabilistic_difficulty": false, "inform_critic": false}' --verbose 1 --batch-size 65536 --minibatch-size 8192 --gamma .999 --attn-dim 128 --attn-ff-dim 512 --lr 1e-4 --lambda_ .8 --vf-clip inf --stop-iters=2 --num-env-runners 1 --gpus-per-learner 0.0001 --cpus-per-env-runner 15 --envs-per-env-runner 60 --remote-worker-envs

# Okay, give an env runner 15 CPUs and check usage:
# short burst of 33% GPU usage, 11% CPU usuage, followed by burst of 90% with 10% CPU usage throughout. Memory at 77%
# First epoch done in 78s; Second in 157s

# Let's triple the number of envs per env runner, up to 180. No other changes:
# short burst of 24% GPU usage, 11% CPU usage, followed by burst of 90% GPU usage, still 10% CPU usage throughout.
# First epoch done in 80s, second in 164s

# Okay, let's divide the number of envs per runner by three, down to 20.
# short burst of 40% GPU usage, CPU still 10% throughout.
# First epoch done in 98s, second in 200s

# Try it with 1.0 GPUs per learner, just to see if the GPU helps
# 0% GPU usage, then 90% during training. CPU isn't constant, but goes to 40% during rollouts.
# First epoch done in 89s. Second in 182s. This is the opposite of what I'd expect - denying GPU usage to workers yields (slightly) better results!

Intuitively, I’m not sure that to make of this. Neither CPU nor GPU seem to be the bottleneck during rollouts. My best guess is that I’m dealing with a less-than-perfectly optimized environment, so I’m not able to push my CPUs to 100 percent. I was using a bunch of env runners in most of my earlier tests, so I suspect that lack of parallelism across workers was responsible for some of what I saw when adding GPUs to them, and I’ve now seen firsthand that GPU resources indeed get reallocated properly between learners and env runners when alternating between rollouts and training.

The good news is that weight updates take up about 70% of my time for each step, and I’ve now got a better sense of how to shave off some of the remainder.

Topic		Replies	Views
Training and inference ONLY using GPUs and no CPUs RLlib	7	1968	April 12, 2021
Does ChatGPT suggests correct config for 1 gpu and 72 cpus? RLlib	1	63	November 18, 2024
Different hardware usage of rollout-workers during sampling on cluster Configure Algorithm, Training, Evaluation, Scaling	1	462	March 6, 2023
Total Workers == (Number of GPUS) - 1? Configure Algorithm, Training, Evaluation, Scaling	1	1284	February 9, 2023
Num_gpu, rollout_workers, learner_workers, evaluation_workers purpose + resource allocation Configure Algorithm, Training, Evaluation, Scaling	8	2192	August 24, 2023

Using my GPU for both rollouts and learning with PPO?

Related topics