Observation and Reward Normalization

evo11x · November 16, 2022, 10:04pm

The observation and rewards need to be normalized if the the observation values are over 1000 and rewards sometimes over 100 ? Or rllib normalizes the observation and rewards ?

None: Just asking a question out of curiosity

kourosh · January 6, 2023, 10:53pm

@gjoliver Do you know if this is possible without connectors?

The only solution that I can think of is either using env wrappers or creating custom preprocessors and feeding that in to the policy’s preprocessor list via callbacks on algorithm_init method.

I think connectors (which will be released in 2.3) is a more elegant solution to this, but we need a couple of examples to show how this use-case of normalizing obs / reward spaces would be achieved with custom connectors.

gjoliver · January 7, 2023, 12:13am

Agreed. Would be awesome to build this as a connector, so that the trained policy will always operate in this mode regardless of the env it deals with.
Env wrapper is another way, but then you (the user) will have to manage it, and make sure whenever you want to use a policies trained with these normalizations, the wrapper needs to be applied.

Topic		Replies	Views
Normalize reward RLlib	4	2237	June 4, 2025
How to invoke custom Connectors for preprocessing obs? RLlib	1	262	May 11, 2023
Normalizing observations in PPO+LSTM RLlib	1	537	May 23, 2023
Normalizing Observations Configure Algorithm, Training, Evaluation, Scaling	5	1440	December 22, 2022
How to do the reward normalization in RLlib's PPO RLlib	2	3134	December 14, 2021

Observation and Reward Normalization

Related topics