Wrong rewards: is there some reward normalization in PPO?

LilHairdy · December 7, 2021, 8:00am

Hey folks,

I hooked up my custom Unity environment to Ray and trained it using PPO. The monitored rewards in tensorboard are not correct. My environment does not signal anything smaller than -0.1 at minimum, but in Tensorboard values fall beyond -1.0.

Is there some running reward normalization occurring?

LilHairdy · January 29, 2022, 11:14am

Looks like Ray is not usable at all.

Sertingolix · January 30, 2022, 1:50pm

I did not work with Unity and ray so far. It sounds to me like you have multiple timesteps/ agents giving -0.1

Topic		Replies	Views
Intrinsic Reward in ICM RLlib	0	411	June 27, 2022
Unexpected dramatic drop in reward RLlib	8	945	November 13, 2023
Mismatch between the results of PPO after upgrading to Ray 1.8.0 RLlib	2	330	December 15, 2021
Ray Tune and Ray RLLIB RLlib	1	194	April 14, 2023
Use Policy_Trainer with TensorBoard RLlib	33	2294	November 13, 2021

Wrong rewards: is there some reward normalization in PPO?

Related topics