Yes you’re right, my spaces are dictionaries so they aren’t flattened by the network, so I’ll resort to using the flattened observation space (which contains the action masks, but that’s not really a big deal).
However, this still does not work because DQN adds a final linear layer on top of the model forward()
, so the masking of action_mask_model.py does not work. Do you know how to disable this behavior?