Mismatch between the results of PPO after upgrading to Ray 1.8.0

Hi everyone,

I recently upgraded my Ray to 1.8.0 (was previously using 1.2.0) and realized that there is a mismatch between training results of PPO. I was wondering if there is any known changes that can cause this problem?

More details:

  1. I binary searched the version and it happens in the upgrade from 1.5.2 to 1.6.0

  2. I initialize a network take one iteration of updates with PPO. Initialization looks the same with the same seed in 1.5.2 and 1.6.0 but the L2 norm of network weights are different after 1 iteration.

  3. I am using Ray locally and not on the server for the above (there is only 1 worker)

Thank you so much.

I make a guess: it uses the MultiGPUTrainOneStep() function and before it used the TrainOneStep() function. You could set "simple_optimizer":True in your config and rerun your experiment.

Yes, that was it, thanks.

1 Like