How to correctly apply observation normalization?

I’m using MeanStdFilter in my PPO example. It works during the training process.
But, i’m not sure whether to use filtered observations when calling trainer.compute_single_action.
In my opinion, the model was trained based on filtered data, and thus the compute_single_action function should also take the filtered obs as the input.
However, in my example, the action obtained without the filtered obs performs better than the action obtained with filtered obs.
Then, what is the correct way to apply MeanStdFilter to a trained PPO?

The config is as follows.

config = {"framework": "torch",
                   "num_workers": 2,
                   "num_gpus": 0,
                   "batch_mode": "complete_episodes",
                   "observation_filter": "MeanStdFilter",
                   # "log_level": "DEBUG",
                   # "callbacks": Callbacks4Normalization,
                   "gamma": 0.99,
                   "env_config": train_env_config, }

The code fragement is as follows.

filter = agent.workers.local_worker().filters.get("default_policy")
print("filter: ", filter.running_stats.n, filter.buffer.n, filter.running_stats.mean)
# filtered_obs = filter(obs, update=False)
# action = agent.compute_single_action(filtered_obs )
action = agent.compute_single_action(obs)

Hi @imxuemei ,

Yes, indeed the model should do better on filtered inputs.
If you use the Algorithm’s (formerly Agent) compute_single_action, this will apply the filter already. So you should not filter twice, by filtering manually.
This is different if you use the Policy’s compute action methods, which don’t do filtering.

This logic will likely soon be replaced with more intuitive modes of using your trained policy for inference. Make sure to check out upcoming Ray versions :slight_smile:

Cheers

I got it. Thanks very much!