High: It blocks me to complete my task.
Hello, I’ve been working with Ray RLlib for my latest project, specifically using the PPO, DQN, SAC algorithms. My setup includes a CUDA-capable GPU, which is correctly recognized by PyTorch and Ray. However, I’ve encountered an issue where the GPU is detected but not utilized during training, as confirmed by the nvidia-smi
command showing low GPU memory usage and compute utilization.
Relevant code part:
use_gpu = torch.cuda.is_available()
if use_gpu:
print(f"CUDA is available. Number of GPUs: {torch.cuda.device_count()}")
print("GPU Name:", torch.cuda.get_device_name(0))
ray.init(num_gpus=1)
else:
print("CUDA is not available. Running on CPU.")
ray.init()
if algorithm_name == "PPO":
algo = (
PPOConfig()
.training(train_batch_size=train_batch_size_input, sgd_minibatch_size=sgd_minibatch_size,
num_sgd_iter=num_sgd_iter, clip_param=0.2)
.rollouts(num_rollout_workers=1)
.resources(num_gpus=1 if use_gpu else 0)
.framework("torch") # or .framework("tf") for TensorFlow
.callbacks(callback_factory)
.environment(env="bertrand", env_config=environment_config)
.multi_agent(
policies=["agent0", "agent1"],
policy_mapping_fn=(lambda agent_id, *args, **kwargs: agent_id))
.build()
)
And console output:
CUDA is available. Number of GPUs: 1
GPU Name: NVIDIA GeForce RTX 4090
GPU usage situation during the whole task:
+--------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 Off | Off |
| 0% 42C P8 11W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=====================================================================================|
| No running processes found |
+--------------------------------------------------------------------------------------+
Package:
- Python 3.10
- ray 2.6.1
- CUDA 12.3