I’m currently trying to train a MultiAgentEnvironment with a MultiDiscrete Action Space. Since a multidiscrete action space would be preferable over a continous action space in this simulation, i’d like to ask if it would be possible to add MultiDiscrete Action spaces to the supported Spaces for Rainbow/DQN.
I know so far that quite a few people tried to use MultiDiscrete Action Spaces, but either flattend the Action Space or used PPO instead. Since a simple flattening of the action space would massivly increase the size of the output layer of the network, since every possible combination of Discrete Actions would get it’s own Q-value, this approach would most likely increase both the training time as well as the required network complexity.
This would not only help in my special usecase, but also help to evaluate the performance of Rainbow compared to PPO in several example usecases.
I hope anyone can help me to make it possible to train my simulation with Rainbow, or point me to parts of the code that i could change to make it work.