Reproduce DDPPO algorithm's result

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

Hello, I am seungju.
I have a problem about reproducing the distributed reinforcement learning algorithm such as DDPPO algorithm.
Here is my ddppo configuration which triggers reproducing problems:
image
I found that the algorithm can not be reproduced ifremote_worker_envs is true.

I guess that it is not reproducible because it depends on the parallel sampling which can be easily affected by hardware factors.

I wonder if it’s reproducible. If it is not reproducible, I would appreciate it if you answer to me.

Thanks.

Hi @seungju-mmc, Unfortunately DDPO (much like APPO) is an async algorithm and reproducibility is not guaranteed for these algos. As you said it depends on a lot of factors, including hardware and os state differences.