Normalize reward

Lauritowal · March 10, 2021, 9:53pm

Hi guys!
How would you scale the reward in Rllib? (+ clipping) I know that Stable Baselines has something like that: stable-baselines3/vec_normalize.py at 237223f834fe9b8143ea24235d087c4e32addd2f · DLR-RM/stable-baselines3 · GitHub

I found the RunningStat class in ray.rllib.utils.filter and tried to write an env wrapper like so:
import gym
import numpy as np

from ray.rllib.utils.filter import RunningStat


class NormalizeReward(gym.RewardWrapper):
    GAMMA = 0.9
    CLIP = 10
    EPSILON = 1e-8

def __init__(self, env):
    super().__init__(env)

    shape = env.observation_space.shape

    self.running_stats = RunningStat(shape)
    self.ret = np.zeros(shape) # return

def reward(self, reward):
    return self._normalized_reward(reward)

def _normalized_reward(self, reward):
    self.ret = NormalizeReward.GAMMA * self.ret + reward
    self.running_stats.push(self.ret)

    reward = reward / (self.running_stats.std + NormalizeReward.EPSILON) # add self.epsilon to avoid dividing by zero

    # reward = np.clip(reward, -NormalizeReward.CLIP, NormalizeReward.CLIP)
    return reward

How would you solve that? And how would I make it work with parallel envs?

sven1977 · March 11, 2021, 7:30am

You could also do this in a postprocessing step (on the actual train batch). In there you have all rewards together from different sub-envs.

Use our callbacks:
rllib/examples/custom_metrics_and_callbacks.py

and override the on_postprocess_trajectory method. In there, you can change the postprocessed_batch SampleBatch object and normalize the values under the “rewards” key.

Elena · October 17, 2023, 1:09pm

Hi Sven,

I tried the option you suggested, however, the updated (normalized or changed in any other way) reward does not affect the training. I believe this is due to that the on_postprocess_trajectory callback is called after the obs/reward is processed. Is there any way to update the reward and/or observation before it is processed?

Shengchao_Y · December 7, 2023, 3:39pm

Hi Elena,
I have the same question.
Have you found an answer?

mizhou0309 · June 4, 2025, 1:42am

I also met the same problem, have you found any answer?

Topic		Replies	Views
How to do the reward normalization in RLlib's PPO RLlib	2	3091	December 14, 2021
Observation and Reward Normalization RLlib	2	646	January 7, 2023
No Reward Appearing for MARL Environment during Training	5	1260	April 10, 2021
[rllib] Modify multi agent env reward mid training RLlib	7	1307	May 27, 2021
How to normalize reward in PPO with new API stack? RLlib	0	12	June 4, 2025

Normalize reward

Related topics