Hey folks,
I hooked up my custom Unity environment to Ray and trained it using PPO. The monitored rewards in tensorboard are not correct. My environment does not signal anything smaller than -0.1 at minimum, but in Tensorboard values fall beyond -1.0.
Is there some running reward normalization occurring?