Action masking for multi-agent DQN

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am trying to train a multi-agent model with action masking based off of this example. However it seems that in this example, the num_outputs is the same size as the action space, which is not the case in my model, so I am unsure how to proceed.

Some things I have attempted:

  • Ignoring the num_outputs parameter when building my neural network and making the final layer instead have the same number of neurons as the action space. Received the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x19 and 256x256)
  • Tried to emulate the implementation shown here. I was a bit unsure on the avail_actions and how to generate them, and I received the same RuntimeError as the previous attempt.
  • Changing num_outputs to be the same size as the action space. I was unable to determine where num_outputs is being set. I also didn’t see any options in the configs that appeared to refer to the num_outputs.

Here is the code for my model (simplified).

class MyModel(TorchModelV2):
	def __init__(self, obs_space, act_space, num_outputs, *args, **kwargs):
		TorchModelV2.__init__(self, obs_space, act_space, num_outputs, *args, **kwargs)
		
		self.model = nn.Sequential(
			(nn.Linear(flatdim(obs_space) - act_space.n,8192)),
			(nn.Linear(8192,num_outputs)),
		)

	def forward(self, input_dict, state, seq_lens):
		assert torch.equal(input_dict["obs_flat"][:,-19:], input_dict["obs"]["action_mask"])
		model_out = self.model(input_dict["obs_flat"][:,:-19])
		
		return model_out, state

Any help would be greatly appreciated!

Hello, I am having the same exact issue. Has a solution been found for this?