Override get_q_value_distributions

jmugan · October 14, 2021, 11:44pm

Hi, I’m trying to mask actions for DQN.

Line 134 of ray/dqn_torch_model.py at master · ray-project/ray · GitHub says to override get_q_value_distributions, but when I do that it ignores my get_q_value_distributions and instead calls the one in DQNTorchModel. I guess this is because in build_q_model_and_distribution on line 169 of ray/dqn_torch_policy.py at master · ray-project/ray · GitHub it sets model_interface=DQNTorchModel.

So, how do I get it to use my get_q_value_distributions? Do I have to create a new model interface and pass that in instead? That’s pretty deep in the DQN abstraction, so I am probably confused about something. Thanks!

mannyv · October 15, 2021, 1:22am

Hi @jmugan,

Two questions for you:

I am sure you are doing it right but just as a double check how are you specifying the custom model in the config?
Is your custom model a subclass of DQNTorchModel?
If it is not this code here will make it one

github.com

ray-project/ray/blob/f372bb07aa6f9b0b936a67337dc6e3d57a2afa91/rllib/models/catalog.py#L406-L408

    
      
          logger.info("Wrapping {} as {}".format(model_cls, model_interface))
          model_cls = ModelCatalog._wrap_if_needed(model_cls,
                                                   model_interface)

The way it orders the multiple inheritance would find rllib’s version before yours:

github.com

ray-project/ray/blob/f372bb07aa6f9b0b936a67337dc6e3d57a2afa91/rllib/models/catalog.py#L756-L769

    
      
          def _wrap_if_needed(model_cls: type, model_interface: type) -> type:
              if not model_interface or issubclass(model_cls, model_interface):
                  return model_cls
          
          
    assert issubclass(model_cls, ModelV2), model_cls
          
          
    class wrapper(model_interface, model_cls):
                  pass
          
          
    name = "{}_as_{}".format(model_cls.__name__, model_interface.__name__)
              wrapper.__name__ = name
              wrapper.__qualname__ = name
          
          
    return wrapper

jmugan · October 15, 2021, 1:50am

The model is registered like this
ModelCatalog.register_custom_model( "unit_model", OurModel )
And OurModel inherits from FullyConnectedNetwork.

If I try to inherit from DQNTorchModel I don’t know how to make the custom forward method where I can put in self.inf_mask = torch.clamp(torch.log(action_mask), FLOAT_MIN, FLOAT_MAX)

mannyv · October 18, 2021, 1:14pm

Hi @jmugan,

Sorry for the delayed response. I had a busy weekend. I would think you might try something like this:

class MyVeryOwnDQNTorchModel(DQNTorchModel):

   def __init__(self,    self,
            obs_space: gym.spaces.Space,
            action_space: gym.spaces.Space,
            num_outputs: int,
            model_config: ModelConfigDict,
            name: str,
            *,
            q_hiddens: Sequence[int] = (256, ),
            dueling: bool = False,
            dueling_activation: str = "relu",
            num_atoms: int = 1,
            use_noisy: bool = False,
            v_min: float = -10.0,
            v_max: float = 10.0,
            sigma0: float = 0.5,
            add_layer_norm: bool = False):
                 super(MyVeryOwnDQNTorchModel, self).__init__(
                     obs_space,
                     action_space,
                     num_outputs,
                     model_config,
                     name,
                     q_hiddens,
                     dueling,
                     dueling_activation,
                     num_atoms,
                     use_noisy,
                     v_min,
                     v_max,
                     sigma0,
                     add_layer_norm) 
                 self.inf_mask = None
    
    def forward(self, input_dict, state, seq_lens):
          #self.in_mask = #your logic goes here
          return super(DQNTorchModel, self).forward(input_dict, state, seq_lens)


    def get_q_value_distributions(self, model_out):
        """
             Returns distributional values for Q(s, a) given a state embedding.
             Override this in your custom model to customize the Q output head.
            Args:
                model_out (Tensor): Embedding from the model layers.
            Returns:
                (action_scores, logits, dist) if num_atoms == 1, otherwise
                (action_scores, z, support_logits_per_action, logits, dist)
        """
          pass        
          #action_scores, logits, *rest  = super(MyVeryOwnDQNTorchModel, self).get_q_value_distributions(model_out)
         # apply masking here using self.inf_mask
         # return ...

jmugan · October 19, 2021, 2:41am

Thanks! I tried something like that, but when I called super on the forward it said that DQNTorchModel didn’t have a forward implemented (NotImplementedError). Do you know why that would be? It’s probably something like the forward gets built automatically but when you go the custom model route that somehow disrupts that process. Or something. The model building part is very confusing to me.

I ended up subclassing my model from DQNTorchModel but defining the whole thing manually and ignoring q_hiddens. It worked a lot worse than PPO, even with discrete actions. So I probably did something wrong.

mannyv · October 19, 2021, 4:16pm

@jmugan,

The abstractions abound.

Here is a colab that you can build off of.

P.S. Sorry this is so ugly =|

jmugan · October 19, 2021, 4:19pm

Wow, very cool! Thanks @mannyv!

Topic		Replies	Views
Custom model for DQN RLlib	3	804	July 20, 2021
[Contribution] [Help needed] Implementing easy action masking for distributional and dueling DQN RLlib	2	481	February 23, 2023
Applying action mask for DQNTrainer with 'hiddens' a non-empty list doesn't work RLlib	1	285	October 26, 2023
Issue with Custom PyTorch Model in Ray RLlib RLlib	0	307	November 3, 2023
Cannot understand how to create custom model for DQN RLlib	2	1493	April 29, 2022

Override get_q_value_distributions

Related topics