GPU Detected but Not Utilized in Ray RLlib with PPO

High: It blocks me to complete my task.

Hello, I’ve been working with Ray RLlib for my latest project, specifically using the PPO, DQN, SAC algorithms. My setup includes a CUDA-capable GPU, which is correctly recognized by PyTorch and Ray. However, I’ve encountered an issue where the GPU is detected but not utilized during training, as confirmed by the nvidia-smi command showing low GPU memory usage and compute utilization.

Relevant code part:

use_gpu = torch.cuda.is_available()
if use_gpu:
    print(f"CUDA is available. Number of GPUs: {torch.cuda.device_count()}")
    print("GPU Name:", torch.cuda.get_device_name(0))
    ray.init(num_gpus=1)
else:
    print("CUDA is not available. Running on CPU.")
    ray.init()
if algorithm_name == "PPO":
    algo = (
        PPOConfig()
        .training(train_batch_size=train_batch_size_input, sgd_minibatch_size=sgd_minibatch_size,
                  num_sgd_iter=num_sgd_iter, clip_param=0.2)
        .rollouts(num_rollout_workers=1)
        .resources(num_gpus=1 if use_gpu else 0)
        .framework("torch")  # or .framework("tf") for TensorFlow
        .callbacks(callback_factory)
        .environment(env="bertrand", env_config=environment_config)
        .multi_agent(
            policies=["agent0", "agent1"],
            policy_mapping_fn=(lambda agent_id, *args, **kwargs: agent_id))
        .build()
    )

And console output:

CUDA is available. Number of GPUs: 1
GPU Name: NVIDIA GeForce RTX 4090

GPU usage situation during the whole task:

+--------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08    Driver Version: 545.23.08    CUDA Version: 12.3               |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:01:00.0 Off |                  Off |
|  0%   42C    P8              11W / 450W |      3MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+--------------------------------------------------------------------------------------+
| Processes:                                                                          |
|  GPU   GI   CI        PID   Type   Process name                          GPU Memory |
|        ID   ID                                                           Usage      |
|=====================================================================================|
|  No running processes found                                                         |
+--------------------------------------------------------------------------------------+

Package:

  • Python 3.10
  • ray 2.6.1
  • CUDA 12.3