I use the PPO algorithm without share actor-critic.
I want to get the standard deviation of the policy (actor network) of the trained agent.
How to get the policy distribution?
Hey @Xim_Lee , not sure I 100% understand what you are trying to get: The mean/stddev of all of the actor network’s weights or the outputs (parameterizing an action distribution) of the policy network for a given observation?
Thank you for reply, @sven1977 !
Sorry, I guess I didn’t say it clearly.
I mean the stddev of outputs of the policy network for a given observation.
Policy is the distribution about specific observation and PPO algorithm select action through stochastic policy. So I want to stddev of the policy distribution.