Rainbow/DQN with MultiDiscrete Action Spaces


I’m currently trying to train a MultiAgentEnvironment with a MultiDiscrete Action Space. Since a multidiscrete action space would be preferable over a continous action space in this simulation, i’d like to ask if it would be possible to add MultiDiscrete Action spaces to the supported Spaces for Rainbow/DQN.

I know so far that quite a few people tried to use MultiDiscrete Action Spaces, but either flattend the Action Space or used PPO instead. Since a simple flattening of the action space would massivly increase the size of the output layer of the network, since every possible combination of Discrete Actions would get it’s own Q-value, this approach would most likely increase both the training time as well as the required network complexity.

This would not only help in my special usecase, but also help to evaluate the performance of Rainbow compared to PPO in several example usecases.

I hope anyone can help me to make it possible to train my simulation with Rainbow, or point me to parts of the code that i could change to make it work.


Hey @iceboy910447 , we currently don’t have plans to expand our Q-learning algos to run on any non-discrete spaces (cont. and/or complex, like Dict, Tuple, MultiDiscrete). Yes, the only solution right now would be to flatten/discretize your action space, but in case you have a very complex action space, this would probably lead to huge output heads in your model and therefore probably reduced learning performance.

Feel free to provide a fix for that, though, and PR. There are tons of cool papers out there that describe such fixes for DNQ, whether it be to allow for cont. actions or vast discrete spaces (1000s of discrete actions):