Convergence Time and num_workers

Hey All,

My team is using PPO with default exploration setting and a stopping based on stability of episode reward mean. When we increased number of workers, it decreased convergence time initially but then it started increasing back.

4 workers: 100 minutes
16 workers: 76 minutes
32 workers: 53 minutes
48 workers: 105 minutes
64 workers: 118 minutes

Is it possible to keep convergence time decrease some more? Can modifying exploration config make it better?

cc: @sven1977 @rusu24edward @mannyv

Hey @Saurabh_Arora , great question. This kind of makes sense. PPO is a synchronous learner, meaning all workers’ sample() calls are executed in parallel and collected before(!) the learning step happens from a concatenation of all the collected samples. What happens if you have more workers is that your rollout_fragment_lengths also gets adjusted accordingly to make sure the train batch size remains as you configured it. You can try increasing your train_batch_size parameter at the same time as you increase num_workers.

Another alternative would be to try an async algo, such as APPO or IMPALA, which most likely will scale better than PPO.


This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.