Hi,
Is it possible to store a tuple of (obs, action, reward) to then use for training models? This is mainly in the case of changing hyperparameters and instead of rerunning expensive models/environments we can use previous data to speed up training to a degree.
Thanks in advance,
Denys A.
Hey @Denys_Ashikhin , yes, this is usually don by our off-policy algorithms, like DQN, SAC, DDPG, CQL, and TD3.
If you look at their execution plans (e.g. ray/rllib/agents/dqn/dqn.py::execution_plan
), you will see that we create a LocalReplayBuffer
in there that’s used for storing experience tuples from the environment rollouts and re-use the samples therein repeatedly for the training updates.