How to know which of the actions were randomly chosen (stochastic) and which ones were deterministic? I am digging into StochasticSampling as a choice for the mulit-agent PPO algorithm I am currently running - and I had a question about this line:
stochastic_actions = tf.cond( pred=tf.convert_to_tensor(ts < self.random_timesteps), true_fn=lambda: ( self.random_exploration.get_tf_exploration_action_op( action_dist, explore=True ) ), false_fn=lambda: action_dist.sample(), )
Here, if the predicate is true, then all the agents in that step will receive an exploratory action (instead of a sample from the action distribution)? And if it is false, then every agent will perform an action based on the sample from the action distribution.
But isn’t it better to make it act as: “each agent has a chance to do either exploratory action or sample from the distribution”. Sorry, my understanding may not be accurate. But if it is, then would it be wise to modify this to make every agent have a change to either explore or sample from distribution in that step - how would I go about this?
Please advise. Thanks.