How does StochasticSampling work?

Hi @carlorop ,

  1. Also in reference to your other post: StochasticSampling will, if used in an algorithm, be called in the policies, like here.
  2. The distribution of the actions (and therefore of the noise if you will) is parameterized by the outputs of your model. For example, for a guassian diagonal of size l, 2*l outputs of the model will be needed to parameterize this distribution.
  3. The parameters of the an exploratory action sampling step depend on the distribution used. Stddev is one of these.
  4. A policy includes the stochastic sampling step in it’s compute_action methods. Therefore, choosing explore=True will be lead to output of an exploratory action.
  5. The entropy is calculated based on the distribution parameters output by your model. This way, the entropy loss can efficiently control the variance.