Using a custom model to use with DQN is really not straightforward from the examples.
My custom model is a subclass of
DQNTorchModel : I customized
get_q_value_distributions(), but when I train, I’m having the following error :
ValueError: The parameter logits has invalid values
I looked at the example
parametric_actions_cartpole.py to try to understand what’s going on, but I’m even more confused.
In this example, a custom model,
TorchParametricActionsModel is used.
This model is a subclass of
DQNTorchModel, but only
forward() is overwriten (not
I thought in DQN action is selected by taking the action with the maximum Q-value ?
So how can the action be selected if
get_q_value_distributions() is not implemented ?