[Rllib] Proper number for PPO rollout workers

Mirakolix_Gallier · August 1, 2022, 3:20pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

How to determine the best number for PPO rollout workers?

I want to use parallelization with the PPO Trainer class and I am wondering what a proper number for num_workers would be and especially if there can be too many workers.

Should I just max out my machine and set the num_workers to the number of cpu cores (24) minus one for the local worker? Or can there be too many workers in a sense that the local worker receives too many policy updates at the same time which even makes training less efficient? And are there any techniques to “tune” the number of workers or maybe rule of thumbs of what a proper number would be or does this strongly depend on the respective environment and specific model and configuration?

Grateful for any advise!

rusu24edward · August 1, 2022, 4:07pm

Here is a helpful rule of thumb: Training APIs — Ray 1.13.0

Here is a similar issue where I ask a question about what seems to be performance slow down wrt number of workers (unfortunately have not had time to explore this more): Num workers speedup?

I suggest you perform a few scaling studies to see what works well for your computer+algorithm+simulation.

Mirakolix_Gallier · August 4, 2022, 1:18pm

I learned that OpenAI 5 used 57,600 rollout workers. So running like 20-50 workers should definitely not be a problem I guess.
Open AI 5 Dota 2 Paper

Topic		Replies	Views
PPO configuration parameters: num_rollout_workers & train_batch_size Configure Algorithm, Training, Evaluation, Scaling	1	781	November 2, 2023
Increasing the number of rollout worker doesn´t increase the performance Configure Algorithm, Training, Evaluation, Scaling	0	219	December 24, 2023
How many workers? Best way to determine number of workers? RLlib	3	2058	January 3, 2023
Total Workers == (Number of GPUS) - 1? Configure Algorithm, Training, Evaluation, Scaling	1	1195	February 9, 2023
Is set rollout_workers>1 spped up training in normal PPO? RLlib	2	355	May 5, 2023

[Rllib] Proper number for PPO rollout workers

Related topics