Action Masking without Including "action_mask" in the Observation Space?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I had a working action masking model for discrete PPO but was trying to figure out a way to remove the “action_mask” from the observation vector as this could become large and take up a lot of data.

Anyways, I couldnt find a way to access or pass environment variables even if they were in the observation vector to allow the creation of the action_mask vector within the CustomAcitonMaskModel. The idea is to perform the same check I did within the environment to assign the observation “action_mask” but within the model code so that the action_mask vector isnt passed as an observation. The issue is the values within the model are preprocessed and are tensors…

Maybe im thinking about it wrong but an agent doesn’t need to know what actions are available as an observation if the probability of selecting said actions are 0 in the network…

The reason I want this is because with 5 float values I can mask N number of actions based on my env and action space, instead of having to pass N bools of masked actions.