RLlib IMPALA multi GPU performance

nuzant · February 14, 2023, 3:18am

Hello! I have a question about multi-GPU training performance of RLlib. I appreciate your answers in advance!

RLlib IMPALA supports multi-GPU training. I trained with config provided in tuned example pong-impala-fast.yaml and got expected training throughput (around 33k transitions per second). However, when I doubled the resource to 256 workers and 4 GPUs and changed nothing else, the training throughput only reached ~35k, which should be doubled (~60k) in expectation since sampling and training in IMPALA is completely asynchronous.

What’s the problem here? Is there any settings that could be tuned to get a better result? Do we have a standard benchmark of multi-GPU training for RLlib algorithm that could scale up to 8 GPUs?

arturn · March 8, 2023, 11:35pm

What framework do you use? Does the same happen with APPO?

nuzant · March 19, 2023, 7:59am

I tried both pytorch and tensorflow, and the results were close. They both do not scale with multi-GPU. And yes, the same happens with APPO. I thought RLlib reused IMPALA implementation for APPO. Is there any significant difference?

arturn · March 19, 2023, 9:14pm

In RLlib, APPO adds a target network and a KL loss to IMPALA.
We do have a multi (2) GPU release test for APPO.
RLlib is undergoing major changes around Multi-GPU training though.
I’m sure @avnishn has to say more about this.

Topic		Replies	Views
Impala seems inefficient (slow), how to properly initialize? RLlib	1	273	September 22, 2022
[RLlib] Ray trains extremely slow when learner queue is full RLlib	7	2192	May 3, 2021
RL Trial Stuck at pending when trying to use Multi-GPU RLlib	2	1440	October 13, 2021
Impala does not respect GPU allocation RLlib	4	611	February 26, 2021
Performance of algorithms RLlib	3	620	September 2, 2021

RLlib IMPALA multi GPU performance

Related topics