I request a clarification about network architectures used in rllib. In rllib, most algorithms (PPO, DQN, etc) are training a neural network approximation of a policy. Does that mean there is some relationship/ constraint between the total number of actions and the number of nodes in output layer?
Does that mean there is some relationship/ constraint between the total number of actions and the number of nodes in output layer?
Yes. The number of outputs relates to the actions through the action distribution.
In the link you can find multiple action distributions that RLlib uses and if you google them, you can see for each one how many output neurons a single action needs.
Ques 1) If I understood catalog code correctly, for discrete action space, the number of output nodes in policy network is same as the size of action space. Is that correct?
Ques 2) If not correct, does that mean the number of discrete space actions in MDP can be more than the number of output nodes of neural network approximating the policy?
@sven1977 , can you please help add information from your understanding?