Offline training using previous obs+action=reward tuples

Denys_Ashikhin · May 21, 2021, 7:28pm

Hi,

Is it possible to store a tuple of (obs, action, reward) to then use for training models? This is mainly in the case of changing hyperparameters and instead of rerunning expensive models/environments we can use previous data to speed up training to a degree.

Thanks in advance,
Denys A.

sven1977 · May 24, 2021, 10:13am

Hey @Denys_Ashikhin , yes, this is usually don by our off-policy algorithms, like DQN, SAC, DDPG, CQL, and TD3.
If you look at their execution plans (e.g. ray/rllib/agents/dqn/dqn.py::execution_plan), you will see that we create a LocalReplayBuffer in there that’s used for storing experience tuples from the environment rollouts and re-use the samples therein repeatedly for the training updates.

Topic		Replies	Views
Load/save replay buffer RLlib	5	794	September 18, 2022
Initialize replay buffer RLlib	1	488	July 1, 2021
Replay buffer - simple how-to question RLlib	2	300	October 7, 2021
Accessing the memory buffer dqn RLlib	10	1006	January 16, 2022
Add the experiences to the buffer "by hand" RLlib	7	954	December 14, 2021

Offline training using previous obs+action=reward tuples

Related topics