Reproduce DDPPO algorithm's result

Hello, I am seungju.
I have a problem about reproducing the distributed reinforcement learning algorithm such as DDPPO algorithm.
Here is my ddppo configuration which triggers reproducing problems:
I found that the algorithm can not be reproduced ifremote_worker_envs is true.

I guess that it is not reproducible because it depends on the parallel sampling which can be easily affected by hardware factors.

I wonder if it’s reproducible. If it is not reproducible, I would appreciate it if you answer to me.


Hi @seungju-mmc, Unfortunately DDPO (much like APPO) is an async algorithm and reproducibility is not guaranteed for these algos. As you said it depends on a lot of factors, including hardware and os state differences.