How to do the reward normalization in RLlib's PPO

hybug · September 8, 2021, 9:20am

I find recommended way to normalize rewards in Normalize reward. The recommended way is to use callback function.

However, the execution sequence of PPO calls is as follows:

postprocess_ppo_gae: calculate adv using gae or not. The adv is calculated and recorded in SampleBatch.
on_postprocess_trajectory: callback called after a policy’s postprocess_fn is called.

As we can see, if I normalize the rewards in on_postprocess_trajectory, it does not affect the calculation of adv. It seems that modifying the rewards value in the on_postprocess_trajectory callback does not affect the training results.

Besides, I can custom the postprocess_ppo_gae and normalize the rewards in postprocess_ppo_gae, than the adv will be calculated according to normalized rewards. Is this method officially recommended? Or is there any other better way for me to normalize rewards before calculating adv?

Topic		Replies	Views
Normalize reward RLlib	4	2206	June 4, 2025
How to normalize reward in PPO with new API stack? RLlib	0	12	June 4, 2025
How to recompute the advantage in learning (ppo) RLlib	3	720	October 5, 2021
Observation and Reward Normalization RLlib	2	646	January 7, 2023
Proper implement of reward scaling in PPO RLlib	0	333	December 17, 2020

How to do the reward normalization in RLlib's PPO

Related topics