[Rllib] PPO default config output activation function

mannyv · January 28, 2023, 10:32pm

There is no activation function on the outputs of either the policy or value heads. The outputs are the unaltered values of the final linear layer.

For the policy actions these are fed into an action distribution. In the case of a Discrete space it would be the Categorical action distribution.

For the value head these are trained to be the appropriate value estimate for a given observation.

Topic		Replies	Views
Output of PPO with discrete actions RLlib	4	1076	December 15, 2022
Next action in RLlib VisionNetworks RLlib	4	498	April 27, 2021
Policy Module (Model V2) RLlib	5	331	April 12, 2022
Output from custom policy network for PPO RLlib	1	442	November 15, 2022
Requesting Clarification about Network Architectures Designs RLlib	3	260	July 31, 2021