PPO: GPU available, but not utilized

Magnus411 · January 21, 2025, 11:41pm

Hello! I am quite new to using RLlib, but have managed to set up my
I’m relatively new to using RLlib but have managed to set up a training pipeline using the PPO algorithm. My system has a GPU (NVIDIA RTX 3080), and RLlib detects it correctly, but the GPU is not utilized, as confirmed by nvidia-smi.

The console shows:
Logical resource usage: 5.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)

And Nvidia-SMI shows 3% GPU usage.

I have wrecked my brain but I cannot figure out why the GPU is not utilized.

In my main I:

    ray.init(ignore_reinit_error=True, num_gpus=1)
    print(ray.get_gpu_ids())
    print(ray.available_resources())

available_resources returns:

{'accelerator_type:G': 1.0, 'node:__internal_head__': 1.0, 'CPU': 16.0, 'object_store_memory': 6037658419.0, 'node:127.0.0.1': 1.0, 'memory': 12075316839.0, 'GPU': 1.0}
2

but ray.init returns [ ]

I have also tested torch.cuda.is_available() and it returns my GPU

My training config looks like this:

def policy_mapping(agent_id, episode=None, worker=None, **kwargs):
    return "shared_policy"

def get_ma_training_config():
    ModelCatalog.register_custom_model("traffic_model", MyTrafficModel)

    temp_env = TrafficLightMAEnv({"num_lights": 5})

    config = (
        PPOConfig()
        .api_stack(
            enable_rl_module_and_learner=False,
            enable_env_runner_and_connector_v2=False
        )
        .environment(
            env=TrafficLightMAEnv,
            env_config={
                "num_lights": 5,
                "sumo_cfg": r"C:\Users\pc\Documents\Trafic\data\sumo\simulation.sumocfg",
                "max_steps": 20000,
            }
        )
        .framework("torch")
        .resources(num_gpus=1)
        .env_runners(
            num_env_runners=4,
            num_envs_per_env_runner=2,
            rollout_fragment_length=200, 
            sample_timeout_s=180,
        )
        .training(
            train_batch_size=6000,
            num_epochs=10,
            gamma=0.99,
            lr=3e-4,
            lambda_=0.95,
            clip_param=0.2,
            vf_clip_param=10.0,
            entropy_coeff=0.01,
        )
        .callbacks(MetricsCallback)
        .reporting(
            metrics_num_episodes_for_smoothing=100,
            keep_per_episode_custom_metrics=True,
            min_sample_timesteps_per_iteration=1000,
            min_time_s_per_iteration=1,
            metrics_episode_collection_timeout_s=60,
        )
        .debugging(
            logger_config={
                "type": "ray.tune.logger.TBXLogger",
                "logdir": "./logs",
                "log_metrics_tables": True,
            }
        )
        .multi_agent(
            policies={
                "shared_policy": (
                    None,  
                    temp_env.single_light_obs_space,
                    temp_env.single_light_act_space,
                    {
                        "model": {
                            "custom_model": "traffic_model",
                            
                           
                        },
                    },
                )
            },
            policy_mapping_fn=policy_mapping,
        )

    )
    return config

Where I have tried to set num_gpus_per_env_runner, but when i do that i get
Logical resource usage: 0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
in the console.

Im also using a custom module:

class TrafficModel(TorchModelV2, nn.Module):
    def __init__(self, obs_space, action_space, num_outputs, model_config, name):
        TorchModelV2.__init__(self, obs_space, action_space, num_outputs, model_config, name)
        nn.Module.__init__(self)

        input_size = 15  
        hidden_size = 256

        self.input_norm = nn.LayerNorm(input_size)
        
        self.network = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.LayerNorm(hidden_size),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.LayerNorm(hidden_size)
        )

        self.discrete_head = nn.Linear(hidden_size, 3)  
        self.continuous_mean = nn.Linear(hidden_size, 1) 
        self.continuous_log_std = nn.Parameter(torch.zeros(1))  

        self.value_branch = nn.Linear(hidden_size, 1)
        self._value_out = None
    def forward(self, input_dict, state, seq_lens):
        obs = input_dict["obs_flat"].float()

        x = self.input_norm(obs)
        
        features = self.network(x)
        
        phase_logits = self.discrete_head(features)
        
        duration_mean = self.continuous_mean(features)
        duration_mean = torch.tanh(duration_mean) * 27.5 + 32.5  

        log_std = torch.clamp(self.continuous_log_std, -20.0, 2.0)
        log_std = log_std.expand_as(duration_mean)

        model_out = torch.cat([phase_logits, duration_mean, log_std], dim=-1)

        self._value_out = self.value_branch(features).squeeze(-1)

        return model_out, state

    def value_function(self):
        return self._value_out

I have tried to tweak that aswell, but whatever i do it still gets done by the CPU when i log what device is executing it.

I have no idea why its not working, and cant seam to figure out why.

Does anybody have any ideas?

mannyv · January 22, 2025, 4:36pm

Hi @Magnus411,

Welcome to the forum. The model you have is pretty small. My guess is that it is using the gpu but there is not much computation to accelerate. This is common in RL.

If you want to test if you are on the gpu an easy way is with your custom model.

Change this line in the forward method.

obs = input_dict["obs_flat"].float(). cpu()

If it is not using the gpu then it will run without issue. If it is using the gpu then you will get an error about a device mismatch.

Magnus411 · January 24, 2025, 1:19am

Hello and thanks! I tried that actually, but all i could see was printouts from it being executed by the cpu.

I changed to a single agent enviroment, and the GPU worked instantly. so wierd

ZanhaPeng · March 13, 2025, 1:31pm

I’m experiencing the exact same issue. My GPU isn’t being utilized either. Do you have any idea how to fix this?

leo · April 1, 2025, 7:10am

Hi @ZanhaPeng, have you checked the GPU availability with torch? I think ray uses torch by default.
You can use this website to find out which packages are necessary: Start Locally | PyTorch

Topic		Replies	Views
GPU Detected but Not Utilized in Ray RLlib with PPO RLlib	1	719	June 15, 2024
PPO policy in RLIB claims No cuda gpus available despite GPUs being available RLlib	4	378	July 20, 2023
PPO example cannot use GPU RLlib	4	515	August 7, 2021
GPUs not detected RLlib	7	4431	February 21, 2023
PPO experiment not using GPU RLlib	0	443	September 19, 2023

PPO: GPU available, but not utilized

Related topics