I have been working on an rllib project and specifically seeing how we can use action masking to improve results as it is very hard to train due to the frequency of invalid actions (they often outnumber the number of valid actions)
Currently the action space is a multiDiscrete([2, 5, 5]). The first is a boolean 0/1 and the second make up a pair of cartesian x, y coordinates. These are dependent on eachother. For example [0, 1, 1] may be a valid set of actions but [1, 1, 1] isn’t. Unfortunately it does not look like you can mask out combinations but only individual actions (such as getting 0 for the first value)
Has anyone else encountered this issue and been able to overcome it or is this something that cannot be solved?