[rllib] Dict Action Space and Custom Model

When using dict action spaces, your model should output a flat tensor, which will then be passed into a MultiActionDistribution for action sampling. This sampling step then returns a dict.
The alphabetic sorting is potentially a problem, however, it’s forced upon RLlib via gym’s very own Dict space handling (Dict.spaces is an OrderedDict).

If you check the code in MultiActionDistribution (ray/torch_action_dist.py at master · ray-project/ray · GitHub), you will see that we create an alphabetically sorted action_space_struct dict, which we then use to regenerate the action dict from your flat tensor outputs.

In other words, as long as you return from your model a tensor that is sorted alphabetically according your dict (print out self.action_space_struct in the MultiActionDistribution to see what the exact order should be in case you have additional nesting going on), it’ll be fine.
Alternatively, you can use a custom action distribution, which then would handle your model’s output (whatever that would be, e.g. a dict), but then you would be responsible for the “handover” between model and action distribution.

4 Likes