It appears that APPO and PPO return quite different stats upon every trainer.train() step. Logging times/throughput is quite easy with PPO, but some associated fields are missing in APPO, specifically learn_throughput and learn_time_ms.
Additionally, the values returned by sample_time_ms are far below wall-clock time. I’ve also had to dramatically reduce the batch size, as APPO seems to be consuming far more GPU memory than PPO
Questions:
- Where are the throughput values for APPO?
- Why are sample times low?
- Why does APPO consume so much memory?