All or nothing (Explore or sample) actions - correct for each step?

hridayns · October 5, 2022, 10:59am

How to know which of the actions were randomly chosen (stochastic) and which ones were deterministic? I am digging into StochasticSampling as a choice for the mulit-agent PPO algorithm I am currently running - and I had a question about this line:

 stochastic_actions = tf.cond(
            pred=tf.convert_to_tensor(ts < self.random_timesteps),
            true_fn=lambda: (
                self.random_exploration.get_tf_exploration_action_op(
                    action_dist, explore=True
                )[0]
            ),
            false_fn=lambda: action_dist.sample(),
        )

Here, if the predicate is true, then all the agents in that step will receive an exploratory action (instead of a sample from the action distribution)? And if it is false, then every agent will perform an action based on the sample from the action distribution.

But isn’t it better to make it act as: “each agent has a chance to do either exploratory action or sample from the distribution”. Sorry, my understanding may not be accurate. But if it is, then would it be wise to modify this to make every agent have a change to either explore or sample from distribution in that step - how would I go about this?

Please advise. Thanks.

mannyv · October 5, 2022, 10:19pm

Hi @hridayns,

If you look at the arguments in init there is an argument called random_timesteps. This is a configuration parameter you can set in the exploration config. It indicates how many steps you want the policy to act completely randomly at the beginning of training. For PPO the default is 0 which means you will not hit the true branch of that conditional.

hridayns · October 6, 2022, 10:54am

Yes, I understand that now. So it takes a sample from the action distribution at every step. But would it be wise to modify it such that at every step, there is a 50% chance of it sampling from an action distribution and 50% chance of it performing a completely random action?

Topic		Replies	Views
Meaning of StochasticSampling for exploration RLlib	6	694	February 16, 2022
Making the selection of action itself "stochastic" RLlib	12	943	October 3, 2022
How does StochasticSampling work? RLlib	4	982	June 27, 2022
Explorative action or not? RLlib	1	268	April 26, 2022
Decay of StochasticSampling RLlib	2	573	June 9, 2022

All or nothing (Explore or sample) actions - correct for each step?

Related topics