Convergence Time and num_workers

Hey All,

My team is using PPO with default exploration setting and a stopping based on stability of episode reward mean. When we increased number of workers, it decreased convergence time initially but then it started increasing back.

4 workers: 100 minutes
16 workers: 76 minutes
32 workers: 53 minutes
48 workers: 105 minutes
64 workers: 118 minutes

Is it possible to keep convergence time decrease some more? Can modifying exploration config make it better?

cc: @sven1977 @rusu24edward @mannyv

Hey @Saurabh_Arora , great question. This kind of makes sense. PPO is a synchronous learner, meaning all workers’ sample() calls are executed in parallel and collected before(!) the learning step happens from a concatenation of all the collected samples. What happens if you have more workers is that your rollout_fragment_lengths also gets adjusted accordingly to make sure the train batch size remains as you configured it. You can try increasing your train_batch_size parameter at the same time as you increase num_workers.

Another alternative would be to try an async algo, such as APPO or IMPALA, which most likely will scale better than PPO.

2 Likes

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.