How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
To stabilize my agent further, I’m going to try using a buffer for PPO. It’s worth mentioning that OpenAI Five also used a buffer, although not in the same way I intend to use it.
My goal is to collect experiences from multiple episodes and sample X sequences with a sequence length of Y to train my model (my model is RNN). Once selected, these experiences should be removed from the buffer.
I have absolutely no idea where to start, and I’m also unsure whether the buffer can utilize RAM or GPU RAM, depending on the implementation. Additionally, I would like to know which type of RAM is being used for buffering in RLlib.