How severe does this issue affect your experience of using Ray?
- Medium: I need an explanation for better understanding.
What are the default activation functions used in PPO with discrete action space. Especially, what is the output activation for the policy and what is the output activation of the value network?
Moreover, I would be very happy for a hint where I can find the respective information in the GitHub code!
There is no activation function on the outputs of either the policy or value heads. The outputs are the unaltered values of the final linear layer.
For the policy actions these are fed into an action distribution. In the case of a Discrete space it would be the Categorical action distribution.
For the value head these are trained to be the appropriate value estimate for a given observation.