I noticed that when a policy is trained with "observation_filter": "MeanStdFilter"
the mean and standard deviations get updated also the policy gets served (ie by a policy server) whenever a call to ExternalEnv.get_action
is performed.
I could reproduce it with Ray 1.6.0 and 1.13.0
Is there a way to freeze the filter so to get fully deterministic behavior?