[Rllib] PPO default config output activation function

Hi @Mirakolix_Gallier,

There is no activation function on the outputs of either the policy or value heads. The outputs are the unaltered values of the final linear layer.

For the policy actions these are fed into an action distribution. In the case of a Discrete space it would be the Categorical action distribution.

For the value head these are trained to be the appropriate value estimate for a given observation.

1 Like