The original DQN algorithm samples the experiences randomly from the replay buffer, is there any way to sample the experiences in batches that conserve the sequentiallity for DQN or PPO?
Yes, for DQN. PPO does not use a replay buffer.
Check out the current master branch and have a look at the replay_buffer_config
attribute of the DQNConfig. You can set the storage_unit
there to “sequences”.
This obviously has implecations for the rest of the algorithm, so you will want to choose sequencing parameters etc.
Master also features docs on the replay buffer topic.
Best