APPO/IMPALA Logging

It appears that APPO and PPO return quite different stats upon every trainer.train() step. Logging times/throughput is quite easy with PPO, but some associated fields are missing in APPO, specifically learn_throughput and learn_time_ms.

Additionally, the values returned by sample_time_ms are far below wall-clock time. I’ve also had to dramatically reduce the batch size, as APPO seems to be consuming far more GPU memory than PPO

Questions:

  1. Where are the throughput values for APPO?
  2. Why are sample times low?
  3. Why does APPO consume so much memory?
1 Like