Memory efficiency in extremely long-horizon environments

jsuarez5341 · November 23, 2020, 9:37pm

I have an environment >>1000 timestep horizon environment with up to 1024 concurrent agents. Running PPO, a single copy of the environment easily exhausts 64 GB of RAM. OpenAI Five (https://cdn.openai.com/dota-2.pdf, page 31-32) uses a value function bootstrapping inspired approach (well, they also predict win probability) and splits trajectories into smaller segments. Might it be possible to do something similar in RLlib, or do you have other ideas on how to support very long time horizons?

Joseph

sven1977 · November 27, 2020, 3:31pm

Actually, the new trajectory view API might help you here. It’s enabled by default for most major algos ((A|DD)PPO, SAC, DQN, A2/3C, IMPALA, DDPG, TD3, PG) in both torch and tf and helps save some memory during sample collection (e.g. next_ob is not passed on to the the trainer process).
For LSTMs - with the traj. view API - only the needed internal-state vectors (at the max_seq_len chunk-edges) will be transferred and stored. Also there is no double storage anymore of state_in/state_out (they are all the same, just shifted by 1, so we save 50% of memory during sample collection).

Topic		Replies	Views
Save played trajectories in memory RLlib	1	423	August 17, 2022
RecurrentNetwork and Trajectory View API Configure Algorithm, Training, Evaluation, Scaling	0	250	September 21, 2023
Not Sure Which RLlib Algorithm To Use RLlib	5	640	April 27, 2021
RNN support + RAM usage for RL algorithms RLlib	2	214	January 17, 2023
TrajectoryTracking with RLLIB RLlib	14	1272	November 17, 2021

Memory efficiency in extremely long-horizon environments

Related topics