According to the documentation, StochasticSampling is “An exploration that simply samples from a distribution”, I am still wondering what StochasticSampling does.If I am not wrong it adds random noise to the actions.
Assume that I am training a continuous agent such as the continuous PPO. Is this noise constant during the training? If not, how can I chose the decay of the StochasticSampling noise?
Hey @carlorop , thanks for posting this question!
The StochasticSampling
exploration component does not have any decay mechanisms built-in. It really simply samples from the distribution given by the model’s outputs (e.g. n logits → n probabilities (add to 1.0) → sample an action from the thus parameterized categorical distribution).
It’s used by algos such as PPO/IMPALA/APPO/PG/etc… (basically most on-policy algos) by default.
Other exploration components (e.g. EpsilonGreedy
) do have a decay mechanism.
Thank you very much for your reply, however, I am still trying to get my head around it for the case of multidimensional continuous outputs. From your answer, I assume that in the case of discrete algorithms it just adds noise to the logits. Would it just add noise to the actions in the case of continuous distributions?