Action masking error

mannyv · February 6, 2023, 1:57pm

The horizon determines the maximum number of steps an epsiode can have. For many environments there is no horizon and the environment returns done=True when some terminating condition occurs.

For other environments that may be true but you want to also limit it so that if it has not terminated after x steps than it will be artificially terminated.

In the example above there is a third case. Here the environment is exactly 182 steps long and is terminated using horizon. It could have returned done=True after 182 steps but they chose to do it this way instead.

Now in this environment there are 182 decisions to be made (actions to take) and each action can only be taken once. This is why the action size and horizon match. That is essentially an accident (feature) of this environment. In most cases the size of the actions space and the horizon will not match up.

Topic		Replies	Views
Issue creating custom action mask enviorment RLlib	14	2213	October 11, 2023
Problem with action masking RLlib	7	2203	May 19, 2022
Example for action masking (without action embeddings) for tuple action space RLlib	2	680	October 27, 2021
Example for action_masking_rl_module broken? RLlib	2	275	March 2, 2025
Action masking redux RLlib	7	85	March 5, 2025

Action masking error

Related topics