Meaning of StochasticSampling for exploration

carlorop · February 14, 2022, 2:41pm

According to the documentation, StochasticSampling is “An exploration that simply samples from a distribution”, I am still wondering what StochasticSampling does. What is that distribution? Is it sampling completely random states/actions? Is it adding Gaussian noise on the states or actions?.

It is part of the common parameters RLlib Training APIs — Ray 2.0.0.dev0

gjoliver · February 14, 2022, 8:39pm

It corresponds to this Exploration strategy:

github.com

ray-project/ray/blob/master/rllib/utils/exploration/stochastic_sampling.py

import functools
import gym
import numpy as np
from typing import Optional, Union

from ray.rllib.models.action_dist import ActionDistribution
from ray.rllib.models.modelv2 import ModelV2
from ray.rllib.utils.annotations import override
from ray.rllib.utils.exploration.exploration import Exploration
from ray.rllib.utils.exploration.random import Random
from ray.rllib.utils.framework import (
    get_variable,
    try_import_tf,
    try_import_torch,
    TensorType,
)
from ray.rllib.utils.tf_utils import zero_logps_from_actions

tf1, tf, tfv = try_import_tf()
torch, _ = try_import_torch()

This file has been truncated. show original

It basically samples from an action distribution which is constructed from the output of your NN most likely.

carlorop · February 15, 2022, 1:27pm

Is this action distribution an empirical distribution function obtained from the previous actions?

mannyv · February 15, 2022, 1:53pm

Hi @carlorop,

No, it is uniform random.

carlorop · February 15, 2022, 4:56pm

Thank you @mannyv. I can read in the documentation that it includes the following option:

random_timesteps – The number of timesteps for which to act completely randomly. Only after this number of timesteps, actual samples will be drawn to get exploration actions.

However, this parameter is set to 0 by default. How is it applied during training?. Is the system choosing random actions just for these timesteps? Is it choosing random actions for some timesteps during the training after them?

mannyv · February 15, 2022, 6:03pm

@carlorop,

That setting is for completely rabdom actions. So if you set that to non-zero rather than using the policy to determine actions it will generate a random value from the action space. It will do that for as long as the total number of sampled steps is less than your value.

This is a different behavior than your pervious question. After random_timesteps it will start to use the policy to generate actions. StochasticSampling will add noise to the “logits” produced by the policy and then use these values to choose an action.

carlorop · February 16, 2022, 9:27am

Thank you mannyv, that was really helpful.

Just to confirm, I am using a continuous action space, I guess that the logits returned by the policy are scaled to the action range. Hence, this exploration strategy adds noise from a uniform random distribution to these logits. Is that right? Do you know what are the boundaries of the random distribution?

Topic		Replies	Views
How does StochasticSampling work? RLlib	4	982	June 27, 2022
Decay of StochasticSampling RLlib	2	573	June 9, 2022
All or nothing (Explore or sample) actions - correct for each step? RLlib	2	292	October 6, 2022
Making the selection of action itself "stochastic" RLlib	12	943	October 3, 2022
Strategy behind setting values of logp RLlib	1	310	April 14, 2021

Meaning of StochasticSampling for exploration

Related topics