What is the difference between action mask and action available?

I try to understand what what is the difference between action masked and action available?
As I understand action masked is returned by environment.
I try to solve combinatorial environment https://arxiv.org/pdf/2003.03600.pdf with very large action space (preparing school timetable).

As I understand relationship between action space (all actions), masked actions and available actions looks:

Do I understand this correctly?

1 Like

Hey @Peter_Pirog , thanks for posting this question!
I think it’s even simpler. Take a look at this environment here:

It produces an additional “action_mask” observation component, which is basically a binary tensor of len N (N=all actions) and values of either 0.0 (not available) or 1.0 (available).
This tensor is the “mask”.
The “available actions” is simply a list of the possible action values.

For example:
action_space = Discrete(10)
obs = env.reset()
obs = {
“action_mask”: [0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0], # ← only actions 1, 5, and 9 are “available”
“actual_obs”: …

Some action-masking capable model (example here: ray/rllib/examples/models/action_mask_model.py) then needs to interpret the “action_mask” field in the observation accordingly: set all unavailable actions’ logits to -inf.

@sven1977 Thank You for the explanation the sense of “available actions”. Now it’s clear for me :slight_smile: