Different hardware usage of rollout-workers during sampling on cluster

Im training some PPO policies with a custom env on a small cluster with 1 head node and 3 nodes for rollout workers. For some reason one of my 3 rollout worker nodes doesn’t seem to use the gpu during sampling and has increased cpu usage (see the attached screenshot). As my env uses some rendering software that can use either gpu or cpu or both it seems that ray somehow prevents the env from using the GPU and the node switches to rendering on CPU instead.

My config for ressources and rollout looks like this:

train_config = PPOConfig()\
    .resources(
        num_gpus=0.5,
        num_gpus_per_learner_worker=1.0,
        num_gpus_per_worker=1.0,
        num_cpus_per_worker=8,
        placement_strategy="SPREAD"
    )\
    .rollouts(
        num_rollout_workers=3,
        num_envs_per_worker=1,
    )

Is my config wrong? I’d be happy if someone could help explain what could be wrong here!

Edit: Corrected the screenshot showing the situation

I found a somewhat “working” solution with this config:

.resources(
        num_gpus=0.1,
        num_gpus_per_learner_worker=0.8,
        num_gpus_per_worker=1.0,
        num_cpus_per_worker=16,
    )\
    .rollouts(
        num_rollout_workers=2,
        num_envs_per_worker=1,
    )\

It seems when i don’t specify num_gpus and num_gpus_per_learner_worker PPO.train is not using any GPU. However if i set num_gpus to 1.0 the PPO.train is move from the head node to another node (which i don’t want). Is there any documentation to understand how ray shifts the workloads depending on the arguments in rollouts and ressources?