Action masking for multi-agent DQN

bretsky · April 1, 2022, 1:00pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I am trying to train a multi-agent model with action masking based off of this example. However it seems that in this example, the num_outputs is the same size as the action space, which is not the case in my model, so I am unsure how to proceed.

Some things I have attempted:

Ignoring the num_outputs parameter when building my neural network and making the final layer instead have the same number of neurons as the action space. Received the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x19 and 256x256)
Tried to emulate the implementation shown here. I was a bit unsure on the avail_actions and how to generate them, and I received the same RuntimeError as the previous attempt.
Changing num_outputs to be the same size as the action space. I was unable to determine where num_outputs is being set. I also didn’t see any options in the configs that appeared to refer to the num_outputs.

Here is the code for my model (simplified).

class MyModel(TorchModelV2):
	def __init__(self, obs_space, act_space, num_outputs, *args, **kwargs):
		TorchModelV2.__init__(self, obs_space, act_space, num_outputs, *args, **kwargs)
		
		self.model = nn.Sequential(
			(nn.Linear(flatdim(obs_space) - act_space.n,8192)),
			(nn.Linear(8192,num_outputs)),
		)

	def forward(self, input_dict, state, seq_lens):
		assert torch.equal(input_dict["obs_flat"][:,-19:], input_dict["obs"]["action_mask"])
		model_out = self.model(input_dict["obs_flat"][:,:-19])
		
		return model_out, state

Any help would be greatly appreciated!

Amadou · February 23, 2023, 1:29pm

Hello, I am having the same exact issue. Has a solution been found for this?

Topic		Replies	Views
Applying action mask for DQNTrainer with 'hiddens' a non-empty list doesn't work RLlib	1	285	October 26, 2023
Value of num_outputs of DQNTrainer RLlib	3	534	May 9, 2022
Input to TorchModelV2 forward method inconsistent, Configure Algorithm, Training, Evaluation, Scaling	3	536	April 19, 2023
Issue creating custom action mask enviorment RLlib	14	2216	October 11, 2023
Action space with multiple output? RLlib	7	1183	July 14, 2022

Action masking for multi-agent DQN

Related topics