SoftQ Exploration policy

Is there a paper I can refer to understand the Stochastic Sampling-based or SoftQ-based exploration? I am using SoftQ exploration but I need to better understand this exploration technique and also cite it.

Is SoftQ exploration just another word for Boltzman Exploration?

Hi @vishalrangras ,

I agree that information on this is a little sparse. For SoftQ we can keep a constant temperature, while Bolzman exploration by definition reduces the temperature over time afaik.
I’m not entirely sure though.
Can someone confirm this?