Hi, I found it is hard to understand the Concurrently
in ray.rllib.execution.concurrency_ops — Ray 1.11.0, could you please provide a figure to illustrate how it works?
Hi @GoingMyWay ,
the Concurrency
operator can execute a number of LocalIterator
s in different modes (the most often used one is round_robin
with equal weights that then executes the operations in the ops
list alternatingly).
Almost every operation in RLlib (and Ray in general) is packed into an iterator. Let us for an example take the DQN agent. In the execution_plan()
of this agent there are constructed two main operations, namely the store_op
(that collects experiences with the workers and writes them into the replay buffer) and the replay_op
(that samples batches from the replay buffer makes some SGD steps, updates the priorities of the replay buffer and updates the target network, if necessary).
These two operations are each defined via a LocalIterator
object that is able to iterate over the operations to be done in each step. If you call the LocalIterator
of the store_op
you will execute a rollout that collects experiences (exactly max(num_workers, 1) x rollout_fragment_length
) in the environment and then stores the resulting batch of these experiences in the replay buffer. If you call the LocalIterator
of the replay_op
it will sample some batches of experiences (of size train_batch_size
each) and uses them to make some SGD steps).
What you want to do to train your agent is to repeatedly collect experiences and train on new experience batches and to do this the Concurrently
operator is used that makes a union of the LocalIterator
s in the ops
list and calls their next()
function in alternating mode.
@Lars_Simon_Zehnder Thanks for the quick response and the helpful answer. When using round_robin
mode, does it mean at each train iter, the trainer first call store_op
and then call train_op
? Can I add another op
to it?
Update:
I tried to add a new op
. But it seems that the other op
s are not called more frequently during training than the newly added one.
you are almost right. The Trainable
iterates during each step()
once through the store_op
and then once or more times through the train_op
(even though during a single train_op
several batches can be used to make several SGD steps).
When adding operations to the list ops
of the Concurrently
operator make sure that you return the right operation output. You can control this by output_indexes
. For example, the execution_plan
of the DQNTrainer
returns only the output of the second LocalIterator
:
train_op = Concurrently(
[store_op, replay_op],
mode="round_robin",
output_indexes=[1],
round_robin_weights=calculate_rr_weights(config),
)
If you add another operation and this operation should return a value, you need to specify this in the output_indexes
as well.
When using round_robin
mode you can also change the round_robin_weights
and define some operations to be run more often. For example, if you set [2, 1, *]
with three operations in the ops
list Concurrently
in round_robin
mode will iterate the first LocalIterator
twice (maybe sampling and storing), the second LocalIterator
(maybe training) once and the third operator until it is empty.
The DQNTrainer
uses the calculate_rr_weights()
function to compute the round_robin_weights
. It relies thereby on your configuration: with the training_intensity
configuration parameter you have a “knurl” that helps you to adjust the number of training iterations in relation to the number of sampling iterations. But feel free to set the round_robin_weights
hard-coded in your own implementations with Ray.
Dear @Lars_Simon_Zehnder, thank you. I am trying to debug and make all the ops called in the right order and equal frequency.