GPU accelarate that can not be used with ray and tune in training PPO

Keik1999 · December 13, 2023, 1:02pm

I set the ray:

ray.init(num_gpus=1)
results = tune.Tuner(
args.run,
param_space=config,
run_config=air.RunConfig(
stop=stop,
verbose=2,
checkpoint_config=air.CheckpointConfig(checkpoint_at_end=True),
),
).fit()

but the result shows that i can not use the GPU, why:
Logical resource usage: 36.0/104 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:RTX)

Lars_Simon_Zehnder · December 19, 2023, 3:20pm

Hi @Keik1999, it is hard to give an opinion without having a reproducable example. Could you provide one?

Keik1999 · December 23, 2023, 1:45pm

thanks for your kind help. i figure out this problem today with my colleagues in the research group. I found that ray allocates the entire gpu to one cpu by default if we use the code：
.resources(num_gpus=1)

but if i use tune to run different parameters, this code will be useless and ray will run in parallel on multiple CPUs.

so i need to use:
.resources(num_gpus_per_worker=0.05)

then everything will be ok!

Logical resource usage: 32.0/256 CPUs, 0.8000000000000002/1 GPUs

by the way i use ray==2.8

Lars_Simon_Zehnder · December 23, 2023, 5:32pm

Great that you found the solution yourself. Note,num_gpus_per_worker is usually only needed if inference is expensive or your env can use GPUs. Most effective is usually the num_gpus_per_learner_worker if _enable_new_api_stack=True. If the latter is False num_gpus defines the number of GPUs for the local worker where usually training happens.

Topic		Replies	Views
Custom algorithm does not use GPU Configure Algorithm, Training, Evaluation, Scaling	3	1253	November 2, 2023
Resources not being used Ray Core	4	1292	September 21, 2021
Most efficient way to use only a CPU for training RLlib	3	3101	April 22, 2021
Cpu allocation confusion	3	1343	March 7, 2023
Questions about using GPU for the ray[rllib] RLlib	4	2018	August 4, 2023

GPU accelarate that can not be used with ray and tune in training PPO

Related topics