SoftQ takes argmax?

Perusha · September 8, 2021, 8:41am

Hi there

I am trying to understand the SoftQ algorithm so I tested it using the tuned params provided for Cartpole and debugged to understand the object flow. I’m a bit confused by where I end up: we take the argmax() of the distribution instead of sample().

I’m trying to figure out if I am doing something wrong or if my understanding of SoftQ is dodgy. I thought in SoftQ we would sample from the current action distribution?

More details:

Softq config used: https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/dqn/cartpole-dqn-softq.yaml

Debug flow:
In soft_q.py method get_exploration_action() creates the distribution applying the temperature and then passes to StochasticSampling to perform the actual sampling.

In StochasticSampling’s _get_torch_exploration_action() we end up taking the argmax of the distribution (in action=action_dist.deterministic_sample()).

class TorchCategorical:
def deterministic_sample(self)->TensorType:
self.last_sample=self.dist.probs.argmax(dim=1)
Return self.last_sample

Also the explore flag in StochasticSampling is always False, even if I set it in config to be True, so we always end up at action=action_dist.deterministic_sample()

Thanks,
Perusha

mannyv · September 8, 2021, 1:46pm

Hi @Perusha,

Welcome to the forums. Which version of ray are you using?
I am not seeing the same results as you. What I am seeing is that the configuration you posted is using deterministic actions once during initialization with a dummy batch of fake data and then always uses the stochastic actions when training with real data.

I made a colab notebook that prints which type of actions are being used for you to explore here: Google Colab

Feel free to reach out with more questions.

Perusha · September 8, 2021, 2:35pm

Hi @mannyv
Thanks for the welcome and for responding!

I just installed Ray and Rllib last week so I am pretty new to it, but I should have the latest versions.
You’re right… I am baffled by the first few deterministic runs but it seems to be stochastic after that!!

During the debug I landed in the deterministic section and then just assumed it was always going to be deterministic. Thanks for checking and for the colab! At least I know it’s working the way I want now

Thanks for taking the time to take a look - much appreciated!

Best,
Perusha

Topic		Replies	Views
RLlib rollout vs stepping the model manually: different outcomes RLlib	3	599	October 27, 2021
Changing the sampling mechanism in DQN RLlib	7	443	August 28, 2021
Help needed with a Custom Action Distribution (TorchDeterministic) RLlib	4	677	November 19, 2021
How does StochasticSampling work? RLlib	4	978	June 27, 2022
RLLIB Evaluation on a batch of observations Configure Algorithm, Training, Evaluation, Scaling	1	253	December 11, 2023

SoftQ takes argmax?

Related topics