Arbitrary action/observation space

vakker00 · July 9, 2021, 9:46am

Using FlexDict and Repeated spaces can provide great flexibility, but sometimes it’s still not enough.

I would assume that any object could be used for observations (and actions) as long as the Ray backend can handle it. The space would require a sample method, which might not be trivial (but it should be handled by the users anyway).

So is it required to have all these constraints or can there by just a simple way that gets the observations from the envs to the model without preprocessing, flattening, validating, etc.?

E.g.

class NoPreproc(Preprocessor):
    def _init_shape(self, obs_space: gym.Space, options: dict):
        return self._obs_space.shape

    def transform(self, observation):
        return observation

    # Is this necessary?
    # def write(self, observation, array,
    #           offset: int) -> None:
    #     array[offset:offset + self._size] = np.array(observation,
    #                                                  copy=False).ravel()

    @property
    def observation_space(self) -> gym.Space:
        return self._obs_space


class CustomSpace(gym.Space):
    def __init__(self):
        super().__init__()
        self._shape = 1 # required for preproc?
        self.max_len = 10

    def sample(self):
        size = np.random.randint(1, self.max_len)
        return np.random.rand(size)

    # OR
    # def sample(self):
    #    return custom_object

    def contains(self, x):
        return True

but this raises the following:

(pid=1661960)   File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task.function_executor
(pid=1661960)   File "/home/user/miniconda3/envs/mahenv/lib/python3.8/site-packages/ray/_private/function_manager.py", line 566, in actor_method_executor
(pid=1661960)     return method(__ray_actor, *args, **kwargs)
(pid=1661960)   File "/home/user/miniconda3/envs/mahenv/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 516, in __init__
(pid=1661960)     self.policy_map, self.preprocessors = self._build_policy_map(
(pid=1661960)   File "/home/user/miniconda3/envs/mahenv/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1127, in _build_policy_map
(pid=1661960)     preprocessor = ModelCatalog.get_preprocessor_for_space(
(pid=1661960)   File "/home/user/miniconda3/envs/mahenv/lib/python3.8/site-packages/ray/rllib/models/catalog.py", line 638, in get_preprocessor_for_space
(pid=1661960)     prep = _global_registry.get(RLLIB_PREPROCESSOR, preprocessor)(
(pid=1661960)   File "/home/user/miniconda3/envs/mahenv/lib/python3.8/site-packages/ray/tune/registry.py", line 135, in get
(pid=1661960)     value = _internal_kv_get(_make_key(category, key))
(pid=1661960)   File "/home/user/miniconda3/envs/mahenv/lib/python3.8/site-packages/ray/tune/registry.py", line 105, in _make_key
(pid=1661960)     key.encode("ascii"))
(pid=1661960) AttributeError: type object 'NoPreproc' has no attribute 'encode'

vakker00 · July 21, 2021, 8:05am

Actually that error was because I didn’t register the preprocessor, so ignore that.
However, RLlib expects NP arrays, e.g. in rllib/policy/policy.py:

821                         ret[view_col] = np.zeros_like([                                                                                                                                                    
822                             view_req.space.sample() for _ in range(batch_size)                                                                                                                             
823                         ])

which doesn’t make sense only when sample() returns a NP array.

michaelzhiluo · July 22, 2021, 9:39am

Would it be possible to move the arbitrary action space to an action mask? That way, the action space can be constant, which is a fundamental assumption for RL algorithms.

To be precise, make a wrapper Env where the action space size is self.max_len and when stepping through the enviornment, apply a mask that zeroes out indexes of the action space.

vakker00 · July 22, 2021, 3:49pm

Thanks for the suggestion. In a way I think the repeated space provides a similar functionality naturally, which does help a bit, but in general it would make things easier if there was no need for encoding everything as a NP array.

Topic		Replies	Views
New observation and action spaces in Ray 2.0 RLlib	3	310	October 27, 2022
Ray actor error: env.observation_space.contains(dummy_obs) RLlib	4	364	November 11, 2021
RLlib and gym.space RLlib	4	643	November 14, 2021
Repeated in observation space RLlib	2	388	July 3, 2024
Dict observation space flattened RLlib	5	2386	January 25, 2021

Arbitrary action/observation space

Related topics