Add the experiences to the buffer "by hand"

I am working using a RL code that implements algorithms via TensorForce, training by adding the experiences to the buffer by calling a method from the agent. whenever the buffer reaches the step size, it starts training.

A minimum example would be:

for i in range(timesteps):
                    action = agent.predict(state)
                    next_state, reward, done, _ = env.step(action)
                    agent.add_to_buffer(state, action, reward, next_state, done)

Adding the experiences by hand is necessary in this case due to the particular structure of the environment. I am aware that RLLIB is the state-of-the-art library for RL. I wanted to change the agents from the TensorForce implementation to RLLIB. My question is, is there any way to create an equivalent code using RLLIB? I am referring to passing the experience to the agent externally, instead of training under the hood.

Hi @carlorop ,

@Lars_Simon_Zehnder has written a few lines on this before.
I am not aware of any “1 liner” solution for this in RLlib right right now.
Nevertheless, it is possible. Do you need help with coding it? This is a usecase that I am interested in and we can work on it together if you like. :slight_smile:

2 Likes

Please share the results, if you do! :pray:

Hi @arturn. Undoubtedly it would add a considerable value to RLLIB. However I will use other libraries in the meantime. If you manage to implement it I would be eternally thankful.

Ok, I did not expect that. I will ask Sven whether there are plans on this and if he has any recommendations on how this should be approached.

I guess you can fill the replay buffer with an agent’s prediction using a strategy similar to the one employed to generate offline datasets:

[RLlib Offline Datasets — Ray v1.9.0](Example: Converting external experiences to batch format)

1 Like

You can, of course, instantiate a ReplayBuffer and call add() to add experiences. Or use ReplayActors like in Ape-X. Or you can write an execution_plan that makes use of a ReplayBuffer.

I see two ways here:

  • If you plan to run your code on your own machine: Instantiate a ReplayBuffer and use it like in your example. In this case you can use an offline dataset to fill it, like @felipeeeantunes suggested.
  • If you plan to use RLlib and Ray to their full capacities, I suspect that you will not get around writing code that does not look as simple as your code snipped. Especially if you try to gradually mix in some experiences manually and not fill the buffer in the beginning.
1 Like

Hey @carlorop , this is a very valid question! I guess it’s caused by the fact that in RLlib adding to and reading from the buffer is usually done under the hood via the execution plan.
You can indeed access the buffer via trainer.local_replay_buffer. Then call the buffer’s add_batch([some sample batch to add]) method