Discrete tuple action space for simple Q

Hi,

I’m trying to use simple Q with a tuple action space, but it doesn’t seem set up for that. I want to use simple Q because I created a custom model for action masking, and when I use DQN the num_outputs is not set to the number of actions but rather the hidden size (I guess because of the dueling Qs or something).

Any advice? I’ve got this working with PPO, but since I have discrete actions, it seems like I should use Q.

Thanks!
Jonathan

Hi @jmugan,

I think what you would want to do here is:

  1. In your forward function store the mask as a member variable in the model. Then do the standard forward with the regular observation portion of the input.
  2. Override get_q_value_distributions in your custom model to use the member variable stored in (1.) to do the masking.

You may need to mask out the “action_scores” too.

Cool, thanks! That makes sense, but it gives me the error below. I have a tuple of a discrete action space, which in theory should be fine but it looks like it is lookng for a simple discrete space.

    "Action space {} is not supported for DQN.".format(action_space))
ray.rllib.utils.error.UnsupportedSpaceException: Action space Tuple(Discrete(58), Discrete(58), Discrete(58), Discrete(58), Discrete(58), Discrete(58)) is not supported for DQN.```

@jmugan

Sorry I totally misread your question the first time around. It does not support anything other than a discrete action space out of the box.

@sven1977 or @gjoliver might be able to weigh in with changes you could make to get it to work. Or you could use ppo :wink:

I went back to a simple discrete action space, but it is not letting me override get_q_value_distributions. It is still calling the one in DQNTorchModel even though my custom model has the function. In the debugger, the type is OurModel_as_DQNTorchModel. I’ve never seen that kind of thing before. The way the models are constructed seems to be beyond my complexity horizon. I’ll create a separate thread.