Impala seems inefficient (slow), how to properly initialize?


We have been struggling to correctly use all our resources with various algorithms. Our environment should be relatively cheap to sample from since we have a small discrete action space and small input states.

We chose IMPALA after using PPO since it samples and updates asynchronously, but looking at our GPU (where the learner is) it seems like it isn’t. Only once in a couple of minutes we see a little spike for the update in nvtop. In between these spikes the CPU is at 100%. We would expect that we see a constant data feed to the gpu instead of a little spike which quickly decreases to 0 utilization on the gpu when sampling asynchronously.

We have 28 workers with 100 envs each. It seems like it is harder to fill these 28 smaller seperate train batches compared to filling one big train batch in normal PPO.

IMPALA resources config:
‘num_gpus’: 1,
‘num_workers’: 28,
‘num_cpus_per_worker’: 1,
‘num_envs_per_worker’: 100,
‘train_batch_size’: 20000,
‘rollout_fragment_length’: 128,
‘batch_mode’: ‘truncate_episodes’,
‘num_sgd_iter’: 1,

Does somebody know why the GPU isn’t utilized almost all the time while this is a asynchronous algorithm?

Furthermore, does somebody have any guideline on how to improve the speed of the algorithms in general? It feels like we underutilize our hardware.

Kind regards,
Dylan Prins

Hi @DJprins,

I do not have much experience with it, but some people have found ray timeline useful in debugging performance issues.