Using Dict observation space with custom RLModule

Same final error message as described in Training Action Masked PPO - ValueError: all input arrays must have the same shape ok False , but different setting, i.e. here multi-agent, and there single-agent action masking. Recommend to open GH issue, the RLModule API is actively developed at the moment.