Cannot understand how to create custom model for DQN

fedetask · April 29, 2022, 10:52am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I am struggling to understand what is the proper way to use a custom model with the DQN algorithm. The steps I’m taking are:

1. Create a class MyModel that extends TFModelV2.

class MyModel(TFModelV2):
    def __init__(self, obs_space: GymDict, act_space: Discrete, num_outputs: int, 
                        model_config: Dict, name: str):
        super().__init__(obs_space, act_space, num_outputs, model_config, name)
        self.internal_model = FullyConnectedNetwork(obs_space, act_space, num_outputs,
            model_config, name + '_internal',
        )
        self.final_layer = tf.keras.layers.Dense(act_space.n, name='q_values', activation=None)
    def forward(self, input_dict, state, seq_lens):
        logits, _ = self.internal_model({'obs': input_dict['obs_flat']})
        q_values = self.final_layer(logits)
        self._value = tf.math.reduce_max(masked_q_values, axis=1)
        return q_values, state

    def value_function(self):
        return self._value

How is this handled by the DistributionalQTFModel class? I see that RLlib uses it for the policy even if I’m using the default configuration num_atoms: 1 but I don’t really understand how/where my model interacts with it
The num_outputs parameter received in the constructor is wrong, since it refers to the last hidden layer and not to the number of q_values. This breaks the policy since it will expect a different number of outputs (see stack trace below). How does this work? Where and how is the policy created and why does it use the size of the last hidden layer even if no_final_linear is True

Register the model

ModelCatalog.register_custom_model("my_model", MyModel)

Set the model in the configuration:

'dueling': False,  # Not use dueling so no need for separate value branch

'model': {
    'custom_model': 'my_model',
    
    # If I set False, my model receives True anyway
    # in model_config['no_final_linear]. Why?
    'no_final_linear': True,   
    'fcnet_hiddens': [1024, 1024]
}

Stack trace of policy trying to use the wrong number of outputs:

(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 584, in __init__
(pid=7547)     self._build_policy_map(
(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1384, in _build_policy_map
(pid=7547)     self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/policy/policy_map.py", line 133, in create_policy
(pid=7547)     self[policy_id] = class_(
(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/policy/tf_policy_template.py", line 238, in __init__
(pid=7547)     DynamicTFPolicy.__init__(
(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 295, in __init__
(pid=7547)     action_distribution_fn(
(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/agents/dqn/dqn_tf_policy.py", line 219, in get_distribution_inputs_and_class
(pid=7547)     q_vals = compute_q_values(
(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/agents/dqn/dqn_tf_policy.py", line 352, in compute_q_values
(pid=7547)     dist) = model.get_q_value_distributions(model_out)
(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/agents/dqn/distributional_q_tf_model.py", line 184, in get_q_value_distributions
(pid=7547)     return self.q_value_head(model_out)
(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/keras/engine/base_layer_v1.py", line 739, in __call__
(pid=7547)     input_spec.assert_input_compatibility(self.input_spec, inputs,
(pid=7547)   File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/keras/engine/input_spec.py", line 263, in assert_input_compatibility
(pid=7547)     raise ValueError(f'Input {input_index} of layer "{layer_name}" is '
(pid=7547) ValueError: Input 0 of layer "model_1" is incompatible with the layer: expected shape=(None, 1024), found shape=(None, 3)

It might help: If in the model config I set

'fcnet_hiddens': [1024, 1024, 3]  # 3 is the number of actions for an agent

and directly use self.internal_model to compute the q values (i.e. removing self.final_layer), things work but I cannot do this since I will have several agents with a different number of actions each

fedetask · April 29, 2022, 1:08pm

I finally understood the issue: DQN always adds a linear layer on top of the forward() output. This means that forward() must not return the Q values, but just an embedding that will be transformed into Q values by RLlib.

How to disable this behavior and directly handle Q values in the custom model forward()? This is crucial for action masking (and in fact the RLlib example action_masking.py does not work with DQN)

arturn · April 29, 2022, 2:59pm

Topic		Replies	Views
[RLlib] Multi-headed DQN RLlib	5	1326	June 13, 2021
Value of num_outputs of DQNTrainer RLlib	3	534	May 9, 2022
Override get_q_value_distributions RLlib	6	671	October 19, 2021
Apply preprocessor in custom model RLlib	19	2353	May 13, 2024
Why does DQN can have custom function? RLlib	1	251	January 9, 2023

Cannot understand how to create custom model for DQN

Related topics