How does StochasticSampling work?

arturn · June 27, 2022, 3:07pm

Also in reference to your other post: StochasticSampling will, if used in an algorithm, be called in the policies, like here.
The distribution of the actions (and therefore of the noise if you will) is parameterized by the outputs of your model. For example, for a guassian diagonal of size l, 2*l outputs of the model will be needed to parameterize this distribution.
The parameters of the an exploratory action sampling step depend on the distribution used. Stddev is one of these.
A policy includes the stochastic sampling step in it’s compute_action methods. Therefore, choosing explore=True will be lead to output of an exploratory action.
The entropy is calculated based on the distribution parameters output by your model. This way, the entropy loss can efficiently control the variance.

Topic		Replies	Views
Meaning of StochasticSampling for exploration RLlib	6	701	February 16, 2022
Decay of StochasticSampling RLlib	2	579	June 9, 2022
All or nothing (Explore or sample) actions - correct for each step? RLlib	2	293	October 6, 2022
Making the selection of action itself "stochastic" RLlib	12	951	October 3, 2022
Strategy behind setting values of logp RLlib	1	314	April 14, 2021