[RLlib] Difference between ParallelRollout(workers) vs workers.sample()

I’m trying to create my own RL algorithm that requires modification in the execution_plan function.

But to my understanding workers.sample() and ParallelRollout basically does the same thing.

So, I don’t quite understand the difference between the two approach of collecting trajectories using remote workers. Could they be used interchangeably? Can I use either one of them in execution_plan? Some clarification in the document would be nice addition since they are right next to each other in the doc.

Edit: Another question. Is workers.sample() compatible with replay buffer? Do I have to implement my own buffer if I were to use it this way?

1 Like

Hey @51616 , great question!
Think about the execution plan as building a static-graph, just like you would do in tensorflow (1.x) when building a model. The logic you define here does not get executed right away, but you are telling the agent, what the plan should (repeatedly) execute during running the agent.

In other words, if you used workers.sample() inside your execution plan, this line of code would only be executed exactly once (when the execution plan is created at the beginning of your RLlib run).
On the other hand, using ParallelRollouts creates a callable object, calling which (repeatedly) does the rollout then.

If you want to go into more details, you can do ray.init(local_mode=True), set a debug breakpoint into the ParallelRollouts __call__ method and see that we execute this call each time we do a training iteration. When placing a breakpoint into your execution plan function and using workers.sample() instead, you would notice that this line would only ever be executed once (at the beginning when we create the plan).

1 Like

@sven1977 What if I create a function that returns a generator that does workers.sample() + learn_on_batch() every time the generator is iterated over? Will there be a performance hit if I do thing this way as suppose to using ParalllelRollouts + foreach.(TrainOneStep)?
I’m trying to create my own data flow that could be unusual compared to what ray provides and the first route seems more intuitive to me.

I think that’s exactly what you are looking for. Basically, this is what ParallelRollouts in combination with TrainOneStep already do in lots of our built-in algos. You can take a look at the rllib/execution dir to find all these callable classes and get some ideas from those. But yes, you can of course also pack this into a single op. You just have to be careful that you don’t introduce something sequential and break the parallelization.

1 Like