Impala seems inefficient (slow), how to properly initialize?

DJprins · September 22, 2022, 8:46am

Hi!

We have been struggling to correctly use all our resources with various algorithms. Our environment should be relatively cheap to sample from since we have a small discrete action space and small input states.

We chose IMPALA after using PPO since it samples and updates asynchronously, but looking at our GPU (where the learner is) it seems like it isn’t. Only once in a couple of minutes we see a little spike for the update in nvtop. In between these spikes the CPU is at 100%. We would expect that we see a constant data feed to the gpu instead of a little spike which quickly decreases to 0 utilization on the gpu when sampling asynchronously.

We have 28 workers with 100 envs each. It seems like it is harder to fill these 28 smaller seperate train batches compared to filling one big train batch in normal PPO.

IMPALA resources config:
‘num_gpus’: 1,
‘num_workers’: 28,
‘num_cpus_per_worker’: 1,
‘num_envs_per_worker’: 100,
‘train_batch_size’: 20000,
‘rollout_fragment_length’: 128,
‘batch_mode’: ‘truncate_episodes’,
‘num_sgd_iter’: 1,

Does somebody know why the GPU isn’t utilized almost all the time while this is a asynchronous algorithm?

Furthermore, does somebody have any guideline on how to improve the speed of the algorithms in general? It feels like we underutilize our hardware.

Kind regards,
Dylan Prins

mannyv · September 22, 2022, 12:12pm

Hi @DJprins,

I do not have much experience with it, but some people have found ray timeline useful in debugging performance issues.

https://docs.ray.io/en/latest/ray-core/troubleshooting.html#visualizing-tasks-in-the-ray-timeline

Topic		Replies	Views
[RLlib] Ray trains extremely slow when learner queue is full RLlib	7	2192	May 3, 2021
RLlib IMPALA multi GPU performance Configure Algorithm, Training, Evaluation, Scaling	3	604	March 19, 2023
RL Trial Stuck at pending when trying to use Multi-GPU RLlib	2	1440	October 13, 2021
Rllib workers ignoring GPU restrictions RLlib	2	653	December 22, 2020
Performance of algorithms RLlib	3	620	September 2, 2021

Impala seems inefficient (slow), how to properly initialize?

Related topics