How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am struggling to understand what is the proper way to use a custom model with the DQN algorithm. The steps I’m taking are:
1. Create a class MyModel
that extends TFModelV2
.
class MyModel(TFModelV2):
def __init__(self, obs_space: GymDict, act_space: Discrete, num_outputs: int,
model_config: Dict, name: str):
super().__init__(obs_space, act_space, num_outputs, model_config, name)
self.internal_model = FullyConnectedNetwork(obs_space, act_space, num_outputs,
model_config, name + '_internal',
)
self.final_layer = tf.keras.layers.Dense(act_space.n, name='q_values', activation=None)
def forward(self, input_dict, state, seq_lens):
logits, _ = self.internal_model({'obs': input_dict['obs_flat']})
q_values = self.final_layer(logits)
self._value = tf.math.reduce_max(masked_q_values, axis=1)
return q_values, state
def value_function(self):
return self._value
- How is this handled by the
DistributionalQTFModel
class? I see that RLlib uses it for the policy even if I’m using the default configurationnum_atoms: 1
but I don’t really understand how/where my model interacts with it - The
num_outputs
parameter received in the constructor is wrong, since it refers to the last hidden layer and not to the number of q_values. This breaks the policy since it will expect a different number of outputs (see stack trace below). How does this work? Where and how is the policy created and why does it use the size of the last hidden layer even ifno_final_linear
isTrue
- Register the model
ModelCatalog.register_custom_model("my_model", MyModel)
- Set the model in the configuration:
'dueling': False, # Not use dueling so no need for separate value branch
'model': {
'custom_model': 'my_model',
# If I set False, my model receives True anyway
# in model_config['no_final_linear]. Why?
'no_final_linear': True,
'fcnet_hiddens': [1024, 1024]
}
Stack trace of policy trying to use the wrong number of outputs:
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 584, in __init__
(pid=7547) self._build_policy_map(
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1384, in _build_policy_map
(pid=7547) self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/policy/policy_map.py", line 133, in create_policy
(pid=7547) self[policy_id] = class_(
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/policy/tf_policy_template.py", line 238, in __init__
(pid=7547) DynamicTFPolicy.__init__(
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 295, in __init__
(pid=7547) action_distribution_fn(
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/agents/dqn/dqn_tf_policy.py", line 219, in get_distribution_inputs_and_class
(pid=7547) q_vals = compute_q_values(
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/agents/dqn/dqn_tf_policy.py", line 352, in compute_q_values
(pid=7547) dist) = model.get_q_value_distributions(model_out)
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/ray/rllib/agents/dqn/distributional_q_tf_model.py", line 184, in get_q_value_distributions
(pid=7547) return self.q_value_head(model_out)
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/keras/engine/base_layer_v1.py", line 739, in __call__
(pid=7547) input_spec.assert_input_compatibility(self.input_spec, inputs,
(pid=7547) File "/home/fedetask/Desktop/vtl/venv/lib/python3.9/site-packages/keras/engine/input_spec.py", line 263, in assert_input_compatibility
(pid=7547) raise ValueError(f'Input {input_index} of layer "{layer_name}" is '
(pid=7547) ValueError: Input 0 of layer "model_1" is incompatible with the layer: expected shape=(None, 1024), found shape=(None, 3)
It might help: If in the model config I set
'fcnet_hiddens': [1024, 1024, 3] # 3 is the number of actions for an agent
and directly use self.internal_model
to compute the q values (i.e. removing self.final_layer
), things work but I cannot do this since I will have several agents with a different number of actions each