Wrong rewards: is there some reward normalization in PPO?

Hey folks,

I hooked up my custom Unity environment to Ray and trained it using PPO. The monitored rewards in tensorboard are not correct. My environment does not signal anything smaller than -0.1 at minimum, but in Tensorboard values fall beyond -1.0.

Is there some running reward normalization occurring?

Looks like Ray is not usable at all.

I did not work with Unity and ray so far. It sounds to me like you have multiple timesteps/ agents giving -0.1