APPO/IMPALA Logging

jsuarez5341 · March 15, 2021, 11:21pm

It appears that APPO and PPO return quite different stats upon every trainer.train() step. Logging times/throughput is quite easy with PPO, but some associated fields are missing in APPO, specifically learn_throughput and learn_time_ms.

Additionally, the values returned by sample_time_ms are far below wall-clock time. I’ve also had to dramatically reduce the batch size, as APPO seems to be consuming far more GPU memory than PPO

Questions:

Where are the throughput values for APPO?
Why are sample times low?
Why does APPO consume so much memory?

Topic		Replies	Views
Meaning of timers in RLlib PPO RLlib	6	2071	June 29, 2023
Impala Bugs and some other observations RLlib	9	1084	April 27, 2023
PPO trainer eating up memory RLlib	9	2352	April 2, 2021
Entropy value in IMPALA RLlib	8	792	April 21, 2021
[RLlib] GPU Memory Leak? Tune + PPO, Policy Server + Client RLlib	18	1222	May 29, 2023

APPO/IMPALA Logging

Related topics