Is there a way to “normnalize observations” automatically in RLLIB?
In particular, I am interested in centering observations around the mean, as follows:
new observation = old observation - (running mean) / running std dev
If there is a way to normalize observations, where in the “config” dictionary should it be passed and how?
Thanks!
config = PPOConfig()\
.training(lr=5e-3,num_sgd_iter=10, train_batch_size = 256)\
.framework("torch")\
.rollouts(num_rollout_workers=1)\
.resources(num_gpus=0,num_cpus_per_worker=1)\
.environment(env = env_name, env_config={
"num_workers":N_CPUS - 1,
"disable_env_checking":True} #env_config: arguments passed to the Env + num_workers = # number of parallel workers
)
luzgui
December 21, 2022, 4:49pm
2
Hi, I am having the same question. have you found out the answer?
best
Hi! yes, it is a matter of just applying this configuration:
"observation_filter": "MeanStdFilter",
I took the example from here:
I’m using MeanStdFilter in my PPO example. It works during the training process.
But, i’m not sure whether to use filtered observations when calling trainer.compute_single_action.
In my opinion, the model was trained based on filtered data, and thus the compute_single_action function should also take the filtered obs as the input.
However, in my example, the action obtained without the filtered obs performs better than the action obtained with filtered obs.
Then, what is the correct way to …
Here is the source code, to make sure it’s applying what you need:
import logging
import numpy as np
import threading
logger = logging.getLogger(__name__)
class Filter:
"""Processes input, possibly statefully."""
def apply_changes(self, other, *args, **kwargs):
"""Updates self with "new state" from other filter."""
raise NotImplementedError
def copy(self):
"""Creates a new object with same state as self.
Returns:
A copy of self.
"""
This file has been truncated. show original
And this forum is where I’ve got my answers from:
opened 08:02AM - 10 Jul 20 UTC
closed 06:26PM - 10 Jan 21 UTC
question
stale
### What is your question?
I want to normalize my observations without knowin… g the exact range up front; hence, I think using a running mean for normalization would be best. I only want to apply this normalization to parts of my dict observation space.
What's the recommended way to do that?
The [RLlib documentation](https://docs.ray.io/en/latest/rllib-models.html#custom-preprocessors) points to [Gym wrappers](https://github.com/openai/gym/tree/master/gym/wrappers), but I didn't see any wrapper for running mean normalization of observations (possible that I missed sth).
Other frameworks have their own utility class for this, eg, [OpenAI baselines](https://github.com/openai/baselines/blob/master/baselines/common/vec_env/vec_normalize.py) and [stable baselines](https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize).
Does RLlib have something similar that I could use out of the box? Or do I need to implement it myself (what would be the best starting point)?
Hope this heps! cheers.
luzgui
December 21, 2022, 5:53pm
4
Thank you very much, the problem is that I am using action masking and it will also normalize the mask. So I am having the issue in this other thread:
I’m attempting to use the MeanStdFilter observation filter with an environment that uses action masking and I believe the filter is also normalizing the action mask. I’m using ray 0.8.5 with tensorflow 1.15.4. Here is a script to recreate the issue:
import argparse
import random
import numpy as np
import gym
from gym.spaces import Box, Discrete, Dict, Tuple
import ray
from ray import tune
from ray.rllib.models import ModelCatalog
from ray.rllib.models.tf.fcnet_v2 import FullyConnectedNetwork…
oh, maybe open another thread then, as I haven’t faced that one yet.
Cheers!
mannyv
December 22, 2022, 11:14pm
6
Hi @luzgui ,
I do not have a solution for you but I do have an idea for a work around that might work.
If your mask consists of 0s and 1s then after normalization you should have two unique values in the mask maybe you could reconstruct the mask with something like:
new_mask =(old_mask==old_mask.max()).float()