Convergence Time and num_workers

Saurabh_Arora · January 20, 2022, 2:11pm

Hey All,

My team is using PPO with default exploration setting and a stopping based on stability of episode reward mean. When we increased number of workers, it decreased convergence time initially but then it started increasing back.

4 workers: 100 minutes
16 workers: 76 minutes
32 workers: 53 minutes
48 workers: 105 minutes
64 workers: 118 minutes

Is it possible to keep convergence time decrease some more? Can modifying exploration config make it better?

cc: @sven1977 @rusu24edward @mannyv

sven1977 · January 26, 2022, 3:24pm

Hey @Saurabh_Arora , great question. This kind of makes sense. PPO is a synchronous learner, meaning all workers’ sample() calls are executed in parallel and collected before(!) the learning step happens from a concatenation of all the collected samples. What happens if you have more workers is that your rollout_fragment_lengths also gets adjusted accordingly to make sure the train batch size remains as you configured it. You can try increasing your train_batch_size parameter at the same time as you increase num_workers.

Another alternative would be to try an async algo, such as APPO or IMPALA, which most likely will scale better than PPO.

system · February 3, 2022, 4:59pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is set rollout_workers>1 spped up training in normal PPO? RLlib	2	350	May 5, 2023
PPO configuration parameters: num_rollout_workers & train_batch_size Configure Algorithm, Training, Evaluation, Scaling	1	749	November 2, 2023
[Rllib] Proper number for PPO rollout workers RLlib	2	1698	August 4, 2022
Reproducibility of ray.tune with seeds RLlib	6	3051	July 26, 2022
Lack of convergence when increasing the number of workers RLlib	18	441	February 18, 2025

Convergence Time and num_workers

Related topics