Mismatch between the results of PPO after upgrading to Ray 1.8.0

ramtin_keramati · December 7, 2021, 6:56pm

Hi everyone,

I recently upgraded my Ray to 1.8.0 (was previously using 1.2.0) and realized that there is a mismatch between training results of PPO. I was wondering if there is any known changes that can cause this problem?

More details:

I binary searched the version and it happens in the upgrade from 1.5.2 to 1.6.0
I initialize a network take one iteration of updates with PPO. Initialization looks the same with the same seed in 1.5.2 and 1.6.0 but the L2 norm of network weights are different after 1 iteration.
I am using Ray locally and not on the server for the above (there is only 1 worker)

Thank you so much.

Lars_Simon_Zehnder · December 7, 2021, 9:51pm

I make a guess: it uses the MultiGPUTrainOneStep() function and before it used the TrainOneStep() function. You could set "simple_optimizer":True in your config and rerun your experiment.

ramtin_keramati · December 15, 2021, 3:11pm

Yes, that was it, thanks.

Topic		Replies	Views
Ray.rllib.agents.ppo missing RLlib	3	7594	March 27, 2023
PPO entropy not decreasing in Ray=1.11.0 as Ray=1.2.0? RLlib	8	1165	January 9, 2023
PPO with PyTorch backend slow on GPU for Ray 1.0 RLlib	4	372	August 12, 2021
PPO.train incorrect result RLlib	1	260	May 23, 2023
Unable to replicate original PPO performance RLlib	0	177	May 10, 2024

Mismatch between the results of PPO after upgrading to Ray 1.8.0

Related topics