[gym] How to design "truncated" for a custom env

PhilippWillms · May 14, 2023, 4:02pm

I am thinking about how to properly make use of the “truncated” state which was introduced in late gym releases, but is in fully functional use for gymnasium. In particular, I see the chance to use it in action-masked environments. Usually it is the point that after one action has been taken, it cannot be taken another time in same episode. Does it make sense to develop step() method to flag trunacted in such cases?

Rohan138 · May 23, 2023, 6:12pm

No, the truncated flag is meant for cases where the environment is stopped early due to e.g. hitting a user-defined limit on the length of the episodes, but the environment itself did not terminate. It is unrelated to action masking, settingtruncated=True would be incorrect for the use case you mentioned. I would refer to the gymnasium docs on action masking instead.

github.com/openai/gym

Added Action masking for Space.sample()

openai:master ← pseudo-rnd-thoughts:action-masking

opened 01:21PM - 17 Jun 22 UTC

pseudo-rnd-thoughts

+565 -74

Adds action masking as requested in https://github.com/openai/gym/issues/2823 to… allow spaces to mask certain actions. These masks are the positive case where `1` means that it is possible for the action to be taken and `0` for the action to not be possible. For all of the gym environments, this PR adds a parameter in `sample(mask=...)` with the particular type required being dependent on the space. `Box` is a special case where we don't implement masking due to the neural network not being able to provide values for continuous distributions, however, if a good reason is found, this could be added. To the gym Taxi environment, we add a new info key "action_mask" which is the recommended method for using the masking for custom environments. ### Example masks ```python >>> import numpy as np >>> from gym import spaces # Box space doesn't have masks >>> space = spaces.Discrete(4) >>> space.sample(mask=np.array([0, 1, 1, 1], dtype=np.int8)) 2 >>> space.sample(mask=np.array([0, 0, 0, 0], dtype=np.int8)) 0 >>> space = spaces.MultiDiscrete([4, 2]) >>> space.sample(mask=(np.array([0, 1, 0, 1], dtype=np.int8), np.array([0, 0], dtype=np.int8))) [1 0] >>> space = spaces.MultiDiscrete(np.array([[4, 2], [3, 4]])) >>> space.sample(mask=((np.array([1, 1, 1, 1], dtype=np.int8), np.array([0, 1], dtype=np.int8)), (np.array([0, 0, 0], dtype=np.int8), np.array([1, 1, 0, 0], dtype=np.int8)))) [[2 1] [0 1]] >>> space = spaces.MultiBinary([2, 3]) >>> space.sample(mask=np.array([[0, 0, 1], [1, 1, 0]], dtype=np.int8)) [[0 0 0] [1 1 0]] # Composite spaces (Dict, Tuple and Graph) >>> space = spaces.Dict(a=spaces.Discrete(3), b=spaces.Box(0, 1, (1,))) >>> space.sample(mask={"a": np.array([0, 1, 1], dtype=np.int8), "b": None})) OrderedDict([('a', 1), ('b', array([0.6812336], dtype=float32))]) >>> space = spaces.Tuple((spaces.Box(0, 1, (1,)), spaces.Discrete(3))) >>> space.sample(mask=(None, np.array([0, 0, 0], dtype=np.int8))) (array([0.74909943], dtype=float32), 0) >>> space = spaces.Graph(node_space=spaces.Box(0, 1, (1,)), edge_space=spaces.Discrete(3)) >>> space.sample(mask=(None, np.array([0, 1, 1], dtype=np.int8)), num_nodes=4)) GraphInstance(nodes=array([[0.5791068 ], [0.43347424], [0.6848027 ], [0.23124644]], dtype=float32), edges=array([2, 2]), edge_links=array([[1, 0], [2, 1]])) >>> space.sample(mask=(None, (np.array([1, 1, 1], dtype=np.int8), np.array([1, 0, 0], dtype=np.int8), np.array([0, 1, 0], dtype=np.int8), np.array([0, 1, 1], dtype=np.int8))), num_nodes=4, num_edges=4)) GraphInstance(nodes=array([[0.21213211], [0.4872798 ], [0.69442934], [0.92085034]], dtype=float32), edges=array([2, 0, 1, 2]), edge_links=array([[0, 2], [0, 0], [3, 2], [2, 0]])) ``` - [x] Add tests for sample mask - [x] Add tests for sample with discrete distributions (Discrete, MultiDiscrete, MultiBinary) - [ ] Add test for Box sample - [x] Add docstrings - [x] Add taxi docstrings - [x] Fix taxi action mask

Also, you may want to redirect Gymnasium environment questions to the Gymnasium github issues or their discord server.

PhilippWillms · June 9, 2023, 8:00pm

Thank you @Rohan138 for bringing more clarity to that point. Indeed the Gymnasium discussion forums will also be helpful, nevertheless, in this ray forum I try to reach the group of people who are also using RLlib in particular.

Topic		Replies	Views
Problem with action masking RLlib	7	2218	May 19, 2022
How to deal with irregular action space? Configure Algorithm, Training, Evaluation, Scaling	3	129	April 2, 2024
Do multi-agent environments need to specify an "action_space"? Configure Algorithm, Training, Evaluation, Scaling	11	111	April 7, 2025
Example for action masking (without action embeddings) for tuple action space RLlib	2	681	October 27, 2021
Condition on actions space RLlib	4	363	March 31, 2023

[gym] How to design "truncated" for a custom env

Related topics