Custom preprocessors and original_space variable

Lucas · March 8, 2021, 6:48pm

Hello

I posted some time ago on Slack, on a topic related to preprocessors.

Since then, I have decided to work on my fork just to try and support custom preprocessors. Indeed from what I understand, these are still deprecated.

Now I have a question about the preprocessor.observation_space.original_space variable. Is there a reason for that if condition to exist:

github.com

ray-project/ray/blob/master/rllib/models/preprocessors.py#L89



    @property
    @PublicAPI
    def observation_space(self) -> gym.Space:
        obs_space = gym.spaces.Box(-1., 1., self.shape, dtype=np.float32)
        # Stash the unwrapped space so that we can unwrap dict and tuple spaces
        # automatically in modelv2.py
        classes = (DictFlatteningPreprocessor, OneHotPreprocessor,
                   RepeatedValuesPreprocessor, TupleFlatteningPreprocessor)
        if isinstance(self, classes):
            obs_space.original_space = self._obs_space
        return obs_space


class GenericPixelPreprocessor(Preprocessor):
    """Generic image preprocessor.

    Note: for Atari games, use config {"preprocessor_pref": "deepmind"}
    instead for deepmind-style Atari preprocessing.
    """

In my case I have found that the original_space attribute is not found and this causes the driver’s policy to have its observation space differ from the workers’ ones:

github.com

ray-project/ray/blob/master/rllib/evaluation/worker_set.py#L83


self._remote_workers = []
self.add_workers(num_workers)

# If num_workers > 0, get the action_spaces and observation_spaces
# to not be forced to create an Env on the local worker.
if self._remote_workers:
    remote_spaces = ray.get(self.remote_workers(
    )[0].foreach_policy.remote(
        lambda p, pid: (pid, p.observation_space, p.action_space)))
    spaces = {
        e[0]: (getattr(e[1], "original_space", e[1]), e[2])
        for e in remote_spaces
    }
else:
    spaces = None

# Always create a local worker.
self._local_worker = self._make_worker(
    cls=RolloutWorker,
    env_creator=env_creator,
    validate_env=validate_env,

which later throws an error.

I am confused. Is there something that I am missing?

I could see that @sven1977 worked on that.

sven1977 · March 11, 2021, 8:53am

Yeah, it’s because your custom preprocessor is not one of the “default” ones:

classes = (DictFlatteningPreprocessor, OneHotPreprocessor,
                   RepeatedValuesPreprocessor, TupleFlatteningPreprocessor)

That’s why no original_space property is generated. As a workaround, you could add your custom preprocessor class to that tuple and see whether this would solve it.

However, you should rather use a gym env wrapper to manipulate and preprocess your observation (or actions, rewards, etc…).

Like e.g. in this example script to make an observation one-hot:
ray/rllib/utils/exploration/tests/test_curiosity.py

Lucas · March 11, 2021, 12:49pm

Thank you for your answer.

Yes I got that part. I am indeed using this workaround, and my question rather was: what is the reason to have such different behaviors whether the preprocessor is custom or not? Maybe it’s related to some code located somewhere else in Rllib, which I missed?

Also, I understand your point about using wrappers instead of preprocessors, which is also mentionned in the doc. Though one may want to more generally run preprocessing on SapleBatch data from any samplers, eg. rllib.offline.input_reader.InputReader samplers, as depicted in ray/rllib-components.svg at master · ray-project/ray · GitHub. This can’t be done with wrappers.

sven1977 · March 12, 2021, 7:39pm

Great question (why do we still have default preprocessors, but don’t allow custom ones anymore)!

We should indeed unify this and get rid of the default ones as well. The only reason they are still there in fact is to make sure any observations get stored in sample batches as one single tensor (flattening of Tuples/Dicts) and it would be some work to change that RLlib-wide.

On the other question: You can actually tell your offline readers to do postprocessing on the read-in batches from the offline files (set the config key postprocess_inputs=True) and then custom-define a on_postprocess_trajectory callback, in which you can then do batched processing of your data prior to that data being used for training.

Like so:

config:
    callbacks: [your custom sub-class of the `DefaultCallbacks` class, in which you override `on_postprocess_trajectory`].
    postprocess_inputs: true

Lucas · March 15, 2021, 4:57pm

I understood that, currently and in the future, the intended role of preprocessors is just to transform the observations into the right format for the policy network.

Thank you again for your detailed and kind answers!

Topic		Replies	Views
How to disable auto-encoding? RLlib	3	466	May 26, 2021
Dict observation space flattened RLlib	5	2449	January 25, 2021
Arbitrary action/observation space RLlib	3	466	July 22, 2021
Ray actor error: env.observation_space.contains(dummy_obs) RLlib	4	366	November 11, 2021
Model doesn't recognize ObservationWrapper and keeps using orig_observation RLlib	4	340	October 7, 2022

Custom preprocessors and original_space variable

Related topics