Custom preprocessors and original_space variable


I posted some time ago on Slack, on a topic related to preprocessors.

Since then, I have decided to work on my fork just to try and support custom preprocessors. Indeed from what I understand, these are still deprecated.

Now I have a question about the preprocessor.observation_space.original_space variable. Is there a reason for that if condition to exist:

In my case I have found that the original_space attribute is not found and this causes the driver’s policy to have its observation space differ from the workers’ ones:

which later throws an error.

I am confused. Is there something that I am missing?

I could see that @sven1977 worked on that.

Yeah, it’s because your custom preprocessor is not one of the “default” ones:

classes = (DictFlatteningPreprocessor, OneHotPreprocessor,
                   RepeatedValuesPreprocessor, TupleFlatteningPreprocessor)

That’s why no original_space property is generated. As a workaround, you could add your custom preprocessor class to that tuple and see whether this would solve it.

However, you should rather use a gym env wrapper to manipulate and preprocess your observation (or actions, rewards, etc…).

Like e.g. in this example script to make an observation one-hot:

Thank you for your answer.

Yes I got that part. I am indeed using this workaround, and my question rather was: what is the reason to have such different behaviors whether the preprocessor is custom or not? Maybe it’s related to some code located somewhere else in Rllib, which I missed?

Also, I understand your point about using wrappers instead of preprocessors, which is also mentionned in the doc. Though one may want to more generally run preprocessing on SapleBatch data from any samplers, eg. rllib.offline.input_reader.InputReader samplers, as depicted in ray/rllib-components.svg at master · ray-project/ray · GitHub. This can’t be done with wrappers.

Great question (why do we still have default preprocessors, but don’t allow custom ones anymore)!

We should indeed unify this and get rid of the default ones as well. The only reason they are still there in fact is to make sure any observations get stored in sample batches as one single tensor (flattening of Tuples/Dicts) and it would be some work to change that RLlib-wide.

On the other question: You can actually tell your offline readers to do postprocessing on the read-in batches from the offline files (set the config key postprocess_inputs=True) and then custom-define a on_postprocess_trajectory callback, in which you can then do batched processing of your data prior to that data being used for training.

Like so:

    callbacks: [your custom sub-class of the `DefaultCallbacks` class, in which you override `on_postprocess_trajectory`].
    postprocess_inputs: true

I understood that, currently and in the future, the intended role of preprocessors is just to transform the observations into the right format for the policy network.

Thank you again for your detailed and kind answers!

1 Like