Requesting Clarification about Network Architectures Designs

Hi all,

I request a clarification about network architectures used in rllib. In rllib, most algorithms (PPO, DQN, etc) are training a neural network approximation of a policy. Does that mean there is some relationship/ constraint between the total number of actions and the number of nodes in output layer?

Does that mean there is some relationship/ constraint between the total number of actions and the number of nodes in output layer?

Yes. The number of outputs relates to the actions through the action distribution.
In the link you can find multiple action distributions that RLlib uses and if you google them, you can see for each one how many output neurons a single action needs.

1 Like

Thanks for response, @arturn .

For further clarification,

Ques 1) If I understood catalog code correctly, for discrete action space, the number of output nodes in policy network is same as the size of action space. Is that correct?

Ques 2) If not correct, does that mean the number of discrete space actions in MDP can be more than the number of output nodes of neural network approximating the policy?

@sven1977 , can you please help add information from your understanding?

Hi @Saurabh_Arora,

Q1 is correct the number of outputs matches the number of possible actions.

1 Like