[Rllib] PPO default config output activation function

Mirakolix_Gallier · January 28, 2023, 4:01pm

How severe does this issue affect your experience of using Ray?

Medium: I need an explanation for better understanding.

What are the default activation functions used in PPO with discrete action space. Especially, what is the output activation for the policy and what is the output activation of the value network?

Moreover, I would be very happy for a hint where I can find the respective information in the GitHub code!

mannyv · January 28, 2023, 10:32pm

Hi @Mirakolix_Gallier,

There is no activation function on the outputs of either the policy or value heads. The outputs are the unaltered values of the final linear layer.

For the policy actions these are fed into an action distribution. In the case of a Discrete space it would be the Categorical action distribution.

For the value head these are trained to be the appropriate value estimate for a given observation.

Topic		Replies	Views
Output of PPO with discrete actions RLlib	4	1076	December 15, 2022
Next action in RLlib VisionNetworks RLlib	4	498	April 27, 2021
Policy Module (Model V2) RLlib	5	331	April 12, 2022
Output from custom policy network for PPO RLlib	1	442	November 15, 2022
Requesting Clarification about Network Architectures Designs RLlib	3	260	July 31, 2021

[Rllib] PPO default config output activation function

Related topics