How to correctly apply observation normalization?

imxuemei · October 31, 2022, 2:04pm

I’m using MeanStdFilter in my PPO example. It works during the training process.
But, i’m not sure whether to use filtered observations when calling trainer.compute_single_action.
In my opinion, the model was trained based on filtered data, and thus the compute_single_action function should also take the filtered obs as the input.
However, in my example, the action obtained without the filtered obs performs better than the action obtained with filtered obs.
Then, what is the correct way to apply MeanStdFilter to a trained PPO?

The config is as follows.

config = {"framework": "torch",
                   "num_workers": 2,
                   "num_gpus": 0,
                   "batch_mode": "complete_episodes",
                   "observation_filter": "MeanStdFilter",
                   # "log_level": "DEBUG",
                   # "callbacks": Callbacks4Normalization,
                   "gamma": 0.99,
                   "env_config": train_env_config, }

The code fragement is as follows.

filter = agent.workers.local_worker().filters.get("default_policy")
print("filter: ", filter.running_stats.n, filter.buffer.n, filter.running_stats.mean)
# filtered_obs = filter(obs, update=False)
# action = agent.compute_single_action(filtered_obs )
action = agent.compute_single_action(obs)

arturn · November 15, 2022, 8:56am

Hi @imxuemei ,

Yes, indeed the model should do better on filtered inputs.
If you use the Algorithm’s (formerly Agent) compute_single_action, this will apply the filter already. So you should not filter twice, by filtering manually.
This is different if you use the Policy’s compute action methods, which don’t do filtering.

This logic will likely soon be replaced with more intuitive modes of using your trained policy for inference. Make sure to check out upcoming Ray versions

Cheers

imxuemei · November 19, 2022, 1:01pm

I got it. Thanks very much!

Topic		Replies	Views
Normalizing Observations Configure Algorithm, Training, Evaluation, Scaling	5	1412	December 22, 2022
Normalizing observations in PPO+LSTM RLlib	1	528	May 23, 2023
MeanStdFilter Observation filter also normalizes action mask RLlib	3	1031	December 21, 2022
Meanstd filter weights storage RLlib	0	28	August 14, 2024
Applying MeanStdFilter before forward inference RLlib	0	63	September 25, 2024

How to correctly apply observation normalization?

Related topics