I’m using MeanStdFilter in my PPO example. It works during the training process.
But, i’m not sure whether to use filtered observations when calling trainer.compute_single_action.
In my opinion, the model was trained based on filtered data, and thus the compute_single_action function should also take the filtered obs as the input.
However, in my example, the action obtained without the filtered obs performs better than the action obtained with filtered obs.
Then, what is the correct way to apply MeanStdFilter to a trained PPO?
The config is as follows.
config = {"framework": "torch",
"num_workers": 2,
"num_gpus": 0,
"batch_mode": "complete_episodes",
"observation_filter": "MeanStdFilter",
# "log_level": "DEBUG",
# "callbacks": Callbacks4Normalization,
"gamma": 0.99,
"env_config": train_env_config, }
The code fragement is as follows.
filter = agent.workers.local_worker().filters.get("default_policy")
print("filter: ", filter.running_stats.n, filter.buffer.n, filter.running_stats.mean)
# filtered_obs = filter(obs, update=False)
# action = agent.compute_single_action(filtered_obs )
action = agent.compute_single_action(obs)