Examples of using multiple (simultaneous) actions?

I have a actor critic network that has to choose multiple actions simultaneously. They are specific values (discrete actions) and not continuous values. So I use gym.spaces.MultiDiscrete to group them.
I see that RLlib has MultiActionDistribution/TorchMultiActionDistribution, MultiCategorical classes. (from ray.rllib.models.torch.torch_action_dist). I could not find any dqn or actor-critic examples that uses these classes.

To be more precise, I want to know what would be the output of the Actor network (in a multi action problem) ? Usually Actor network outputs the total number of actions and one of them is chosen for a single action problem. In a multi action problem I suppose I will have to create every single possible combinations of actions and the Actor network would output total combinations of actions (for example MultiDiscrete([4,3,5]) is 4 * 3 * 5=60 outputs). But in my case the combination total is several hundreds. I would like to know if RLlib can handle such multi action problems much easily using the classes I mentioned above.

For PG-type algorithms ((A)PPO, A3C, PG), this should not be a problem and you can use a MultiDiscrete action space, even without having to “flatten” it into a Discrete one.

You can check our test case for this: ray/rllib/tests/test_supported_spaces.py::TestSupportedSpacesPG
where we test all the above algos for different action spaces. You can even use Dict/Tuple action spaces with these.

1 Like