Decay of StochasticSampling

carlorop · April 18, 2022, 7:03pm

According to the documentation, StochasticSampling is “An exploration that simply samples from a distribution”, I am still wondering what StochasticSampling does.If I am not wrong it adds random noise to the actions.

Assume that I am training a continuous agent such as the continuous PPO. Is this noise constant during the training? If not, how can I chose the decay of the StochasticSampling noise?

sven1977 · April 26, 2022, 8:53am

Hey @carlorop , thanks for posting this question!

The StochasticSampling exploration component does not have any decay mechanisms built-in. It really simply samples from the distribution given by the model’s outputs (e.g. n logits → n probabilities (add to 1.0) → sample an action from the thus parameterized categorical distribution).
It’s used by algos such as PPO/IMPALA/APPO/PG/etc… (basically most on-policy algos) by default.

Other exploration components (e.g. EpsilonGreedy) do have a decay mechanism.

carlorop · June 9, 2022, 3:33pm

Thank you very much for your reply, however, I am still trying to get my head around it for the case of multidimensional continuous outputs. From your answer, I assume that in the case of discrete algorithms it just adds noise to the logits. Would it just add noise to the actions in the case of continuous distributions?

Topic		Replies	Views
Meaning of StochasticSampling for exploration RLlib	6	701	February 16, 2022
How does StochasticSampling work? RLlib	4	1000	June 27, 2022
All or nothing (Explore or sample) actions - correct for each step? RLlib	2	293	October 6, 2022
[rllib] Retrieve and modify the computed discrete action logits to PPO agent RLlib	6	719	May 5, 2021
Strategy behind setting values of logp RLlib	1	315	April 14, 2021

Decay of StochasticSampling

Related topics