Hi @carlorop ,
- Also in reference to your other post: StochasticSampling will, if used in an algorithm, be called in the policies, like here.
- The distribution of the actions (and therefore of the noise if you will) is parameterized by the outputs of your model. For example, for a guassian diagonal of size l, 2*l outputs of the model will be needed to parameterize this distribution.
- The parameters of the an exploratory action sampling step depend on the distribution used. Stddev is one of these.
- A policy includes the stochastic sampling step in it’s compute_action methods. Therefore, choosing
explore=True
will be lead to output of an exploratory action. - The entropy is calculated based on the distribution parameters output by your model. This way, the entropy loss can efficiently control the variance.