Apologies for raising a question that may seem trivial to many users, but I still find myself unclear on how RLlib facilitates the parallelization of gradient computations.
I understand there are numerous Rollout workers employed to collate experiences, which subsequently allow the Learner (or RLModule) to sample for gradient optimization.
My curiosity lies in the functioning of RLlib within a clustered environment, specifically in terms of handling gradient computation and aggregation across learners.
Am I right in summarizing the process as follows?
- The model (Neural Network) is duplicated and distributed to each Learner (if
num_learner_workers
> 0).
- Every learner receives experiences from the rollout workers for sampling.
- Post sampling, the learners calculate the gradient optimization and send it to the head node, whose task is to gather the aggregated data.
- Based on the aggregation strategy (whether synchronous or asynchronous), the head node proceeds to update the main model weights, according with the gradients received from the learners.
- Finally, every Learner’s model is synchronized with the main one.
Could you please confirm if it’s accurate? I would greatly appreciate any guidance or suggestions.