Saved ONNX model using DQN dueling policy

GallantWood · May 25, 2022, 6:13am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I have trained a DQN dueling agent using RLlib, restored a checkpoint and saved the ONNX model. When attempting to run the model for inference I find an output of size 256 and not the output size of my model. I guess the reason is the post-processing Ray needs to do to support the dueling mechanism.

Therefore, it seems RLlib does not currently have the proper output size implemented for exporting a model to ONNX (or Torchscript) trained with dueling DQN.

My question is, since the final layer weights are stored somewhere - afterall the function compute_single_action(obs) works within Ray - how can I access these weights to build a final ONNX model for inference?

steff · December 21, 2022, 3:56am

Ran into this same problem using Ray 2.1.0. Any suggestion from the Ray team for how to work around this issue?

Amr_Mousa · March 24, 2023, 10:35am

Same issue here with ray 2.3.0!

Jules_Damji · March 24, 2023, 10:08pm

cc: @arturn @gjoliver Any suggestions?

gjoliver · March 25, 2023, 2:23am

256 is the size of the last model output layer.
RLlib uses a DistributionalModel with this output to compute the final Q values.

github.com

ray-project/ray/blob/master/rllib/algorithms/dqn/distributional_q_tf_model.py#L166


      
                          sigma0,
                          activation=None,
                      )(state_out)
                  else:
                      state_score = tf.keras.layers.Dense(units=num_atoms, activation=None)(
                          state_out
                      )
                  return state_score
          
          
    q_out = build_action_value(name + "/action_value/", self.model_out)
              self.q_value_head = tf.keras.Model(self.model_out, q_out)
          
          
    if dueling:
                  state_out = build_state_score(name + "/state_value/", self.model_out)
                  self.state_value_head = tf.keras.Model(self.model_out, state_out)
          
          
def get_q_value_distributions(self, model_out: TensorType) -> List[TensorType]:
              """Returns distributional values for Q(s, a) given a state embedding.
          
          
    Override this in your custom model to customize the Q output head.

The reason we do that is to support distributional q learning.

I would suggest you use RLlib to serve the model. Or do something similar if you want to use the model outside of RLlib.

Topic		Replies	Views
Setting config["dueling"]=False still runs Dueling DQN RLlib	2	351	August 19, 2021
Cannot understand how to create custom model for DQN RLlib	2	1498	April 29, 2022
RLlib rollout vs stepping the model manually: different outcomes RLlib	3	602	October 27, 2021
[RLlib] Multi-headed DQN RLlib	5	1336	June 13, 2021
Value of num_outputs of DQNTrainer RLlib	3	537	May 9, 2022

Saved ONNX model using DQN dueling policy

Related topics