According to the documentation, StochasticSampling is “An exploration that simply samples from a distribution”, I am still wondering what StochasticSampling does. What is that distribution? Is it sampling completely random states/actions? Is it adding Gaussian noise on the states or actions?.
Thank you @mannyv. I can read in the documentation that it includes the following option:
random_timesteps – The number of timesteps for which to act completely randomly. Only after this number of timesteps, actual samples will be drawn to get exploration actions.
However, this parameter is set to 0 by default. How is it applied during training?. Is the system choosing random actions just for these timesteps? Is it choosing random actions for some timesteps during the training after them?
That setting is for completely rabdom actions. So if you set that to non-zero rather than using the policy to determine actions it will generate a random value from the action space. It will do that for as long as the total number of sampled steps is less than your value.
This is a different behavior than your pervious question. After random_timesteps it will start to use the policy to generate actions. StochasticSampling will add noise to the “logits” produced by the policy and then use these values to choose an action.
Just to confirm, I am using a continuous action space, I guess that the logits returned by the policy are scaled to the action range. Hence, this exploration strategy adds noise from a uniform random distribution to these logits. Is that right? Do you know what are the boundaries of the random distribution?