It appears that APPO and PPO return quite different stats upon every trainer.train() step. Logging times/throughput is quite easy with PPO, but some associated fields are missing in APPO, specifically learn_throughput and learn_time_ms.

Additionally, the values returned by sample_time_ms are far below wall-clock time. I’ve also had to dramatically reduce the batch size, as APPO seems to be consuming far more GPU memory than PPO


  1. Where are the throughput values for APPO?
  2. Why are sample times low?
  3. Why does APPO consume so much memory?
1 Like