If you have an action space which involves permuting a variable sized set of objects (up to 25), is there a way to represent this invalid action mask in RLLib?
With my own (non-RLlib) custom learning code, this is easy to do by just storing the number of objects at each step as an integer, and using a Plackett-Luce distribution to efficiently sample permutations over this variable sized set of objects.
However, I don’t see how I could directly use the RLlib’s support for invalid action masking to do this, since, for starters, the number of possible actions is too big to even enumerate.
Is such a custom action space supported by RLlib with invalid action masking?