GPU Detected but Not Utilized in Ray RLlib with PPO

SS_D · March 26, 2024, 4:48pm

High: It blocks me to complete my task.

Hello, I’ve been working with Ray RLlib for my latest project, specifically using the PPO, DQN, SAC algorithms. My setup includes a CUDA-capable GPU, which is correctly recognized by PyTorch and Ray. However, I’ve encountered an issue where the GPU is detected but not utilized during training, as confirmed by the nvidia-smi command showing low GPU memory usage and compute utilization.

Relevant code part:

use_gpu = torch.cuda.is_available()
if use_gpu:
    print(f"CUDA is available. Number of GPUs: {torch.cuda.device_count()}")
    print("GPU Name:", torch.cuda.get_device_name(0))
    ray.init(num_gpus=1)
else:
    print("CUDA is not available. Running on CPU.")
    ray.init()

if algorithm_name == "PPO":
    algo = (
        PPOConfig()
        .training(train_batch_size=train_batch_size_input, sgd_minibatch_size=sgd_minibatch_size,
                  num_sgd_iter=num_sgd_iter, clip_param=0.2)
        .rollouts(num_rollout_workers=1)
        .resources(num_gpus=1 if use_gpu else 0)
        .framework("torch")  # or .framework("tf") for TensorFlow
        .callbacks(callback_factory)
        .environment(env="bertrand", env_config=environment_config)
        .multi_agent(
            policies=["agent0", "agent1"],
            policy_mapping_fn=(lambda agent_id, *args, **kwargs: agent_id))
        .build()
    )

And console output:

CUDA is available. Number of GPUs: 1
GPU Name: NVIDIA GeForce RTX 4090

GPU usage situation during the whole task:

+--------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08    Driver Version: 545.23.08    CUDA Version: 12.3               |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:01:00.0 Off |                  Off |
|  0%   42C    P8              11W / 450W |      3MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+--------------------------------------------------------------------------------------+
| Processes:                                                                          |
|  GPU   GI   CI        PID   Type   Process name                          GPU Memory |
|        ID   ID                                                           Usage      |
|=====================================================================================|
|  No running processes found                                                         |
+--------------------------------------------------------------------------------------+

Package:

Python 3.10
ray 2.6.1
CUDA 12.3

PhilippWillms · June 15, 2024, 5:31pm

Try to experiment around parameters shown in the example below. Recall that RolloutWorker (old API stack) and Learner (new API stack) need to be configured with the resources they consume in terms of (partial) CPU cores and (partial) GPU cores.

Old API stack

 .resources(
       num_gpus=args.num_gpus, num_cpus_per_worker=4, num_gpus_per_worker=0.3
 )

New API Stack

        .resources(
            num_gpus=args.num_gpus,
        )
        .learners(
            num_gpus_per_learner=1
            # Cannot set both `num_cpus_per_learner` > 1 and  `num_gpus_per_learner` > 0! 
            # Either set `num_cpus_per_learner` > 1 (and `num_gpus_per_learner`=0) OR 
            #   set `num_gpus_per_learner` > 0 (and leave `num_cpus_per_learner` at its default value of 1).
            # This is due to issues with placement group fragmentation. 
            # See https://github.com/ray-project/ray/issues/35409 for more details.
        )

Topic		Replies	Views
PPO: GPU available, but not utilized Debugging and performance tuning	4	190	April 1, 2025
GPUs not detected RLlib	7	4377	February 21, 2023
PPO experiment not using GPU RLlib	0	438	September 19, 2023
PPO policy in RLIB claims No cuda gpus available despite GPUs being available RLlib	4	371	July 20, 2023
Questions about using GPU for the ray[rllib] RLlib	4	2091	August 4, 2023

GPU Detected but Not Utilized in Ray RLlib with PPO

Related topics