Creating buffer for PPO

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

To stabilize my agent further, I’m going to try using a buffer for PPO. It’s worth mentioning that OpenAI Five also used a buffer, although not in the same way I intend to use it.

My goal is to collect experiences from multiple episodes and sample X sequences with a sequence length of Y to train my model (my model is RNN). Once selected, these experiences should be removed from the buffer.

I have absolutely no idea where to start, and I’m also unsure whether the buffer can utilize RAM or GPU RAM, depending on the implementation. Additionally, I would like to know which type of RAM is being used for buffering in RLlib.

guys I really need help on this, can anyone give me a clue?