According to the documentation, StochasticSampling is “An exploration that simply samples from a distribution”, I am still wondering what StochasticSampling does.If I am not wrong it adds random noise to the actions.
Assume that I am training a continuous agent such as the continuous PPO. Is this noise constant during the training? If not, how can I chose the decay of the StochasticSampling noise?
Hey @carlorop , thanks for posting this question!
StochasticSampling exploration component does not have any decay mechanisms built-in. It really simply samples from the distribution given by the model’s outputs (e.g. n logits → n probabilities (add to 1.0) → sample an action from the thus parameterized categorical distribution).
It’s used by algos such as PPO/IMPALA/APPO/PG/etc… (basically most on-policy algos) by default.
Other exploration components (e.g.
EpsilonGreedy) do have a decay mechanism.