DD-PPO RolloutWorker Hangs

Hi guys,

I am using DD-PPO for a GPU-required environment.

For testing purpose, I started the learning with only four rollout workers. It can learn for a few iterations, but then the learning stops silently. When I check the GPU usage after the learning stops, I can see the following:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1     19824      C   ray::RolloutWorker.par_iter_next()          1875MiB |
|    1     19822      C   ray::RolloutWorker                          1875MiB |
|    1     19823      C   ray::RolloutWorker                          1877MiB |
|    1     19821      C   ray::RolloutWorker                          1877MiB |
+-----------------------------------------------------------------------------+

This lasts forever.

It looks to me that the worker with PID 19824 hangs in the process of par_iter_next() while all other workers are waiting. The time this problem occurs is random: somethings in the first iteration, sometimes later.

Do you have any comments on the possible cause for this? Any suggestion on how to debug (e.g., how to check at which step the rollout worker hangs)? Thanks a lot!

Hey @Mark_Zhang , sorry, I’m not sure what this could be. Could you file a github issue and assign it to me with a self-sufficient reproduction script?

Thanks

Thanks @sven1977, I figured it out while I was working on the self-sufficient example.

Sorry I missed a detail in the earlier post as I thought it was not related, but it does.

I was trying to make DD-PPO work with multi-agent environment (multi-agent env with agents could have early dones; agents also share the same policy).

I removed the assertion at this line since it does not stand for multi-agent environment training. I thought it shouldn’t matter as all sampled batch experience are used for training the same policy. But this causes the hanging problem if the batches from different workers (with different sizes) are divided into different numbers of mini-batches using the same config[“sgd_minibatch_size”]. Please see illustration below. This also explains the random occurrence of the hanging.

My workaround for this is to use an adaptive mini-batch size, i.e., “batch.size/NUM_MINI_BATCH”, to make sure every worker gets the same mini-batch number, and this worked out. Any comments on this? More graceful ways to handle this?

2 Likes

This looks great @Mark_Zhang ! Could you provide a PR with your fix? I’m assuming it only affects DD-PPO and if you know that it’s learning in your case, we should merge this to help others use DD-PPO in a multi-agent setting.