Action masking error

Hi @Archana_R,

The horizon determines the maximum number of steps an epsiode can have. For many environments there is no horizon and the environment returns done=True when some terminating condition occurs.

For other environments that may be true but you want to also limit it so that if it has not terminated after x steps than it will be artificially terminated.

In the example above there is a third case. Here the environment is exactly 182 steps long and is terminated using horizon. It could have returned done=True after 182 steps but they chose to do it this way instead.

Now in this environment there are 182 decisions to be made (actions to take) and each action can only be taken once. This is why the action size and horizon match. That is essentially an accident (feature) of this environment. In most cases the size of the actions space and the horizon will not match up.

1 Like