GPU accelarate that can not be used with ray and tune in training PPO

I set the ray:

ray.init(num_gpus=1)
results = tune.Tuner(
args.run,
param_space=config,
run_config=air.RunConfig(
stop=stop,
verbose=2,
checkpoint_config=air.CheckpointConfig(checkpoint_at_end=True),
),
).fit()

but the result shows that i can not use the GPU, why:
Logical resource usage: 36.0/104 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:RTX)

Hi @Keik1999, it is hard to give an opinion without having a reproducable example. Could you provide one?

thanks for your kind help. i figure out this problem today with my colleagues in the research group. I found that ray allocates the entire gpu to one cpu by default if we use the code:
.resources(num_gpus=1)

but if i use tune to run different parameters, this code will be useless and ray will run in parallel on multiple CPUs.

so i need to use:
.resources(num_gpus_per_worker=0.05)

then everything will be ok!

Logical resource usage: 32.0/256 CPUs, 0.8000000000000002/1 GPUs

by the way i use ray==2.8

1 Like

Great that you found the solution yourself. Note,num_gpus_per_worker is usually only needed if inference is expensive or your env can use GPUs. Most effective is usually the num_gpus_per_learner_worker if _enable_new_api_stack=True. If the latter is False num_gpus defines the number of GPUs for the local worker where usually training happens.