I am reading the link: ray/parametric_actions_model.py at master · ray-project/ray · GitHub. The class TorchParametricActionsModel(DQNTorchModel) is inherited from DQNTorchModel. It is bit confusion that since the custom function is supposed to be policy function. But DQN does not have explicit policy function. Anyone have more knowledge on this? Thanks in advance.
Hi @mingjunwang88 , Can you clarify your question a little bit? You mentioned
that the custom function is supposed to be policy function? But DQN does not have explicit policy function.
What do you mean by the custom function and policy function?
This example shows how you can extend a DQN algorithm / model to search over a large number of discrete actions (say 10000 actions). Instead of having an output head of size 10000 logits, you predict an embedding of the observation and compute the inner product of the logits and the embedding of those 10000 actions to product probabilities over the actions. This is what is shown in this file.