I have got a question regarding setting up values for logp during exploration. In case of StochasticSampling exploration, if the explore flag is set to True, the logp value is taken from sampled action and action distribution, which seems correct. In case of no exploration (explore == False), logp value is set to 0, which results in probability 1, which is as well correct (with current deterministic policy we are sure of selecting this action).
However what is not clear for me is why when we use some other exploration which adds something to the action (like GaussianNoise) we do set logp value to 0 as well? Will it not make more sense to calculate how exploration action fits to current action distribution?
Thanks in advance for any answers!