**How severe does this issue affect your experience of using Ray?**

- High: It blocks me to complete my task.

**tl;dr:** the action probabilities returned by `compute_log_likelihoods`

do not resemble the actual behavior of the agent when sampling actions.

**Long:** I’m trying to extract a tabular policy from a trained DQN agent by querying the action probabilities at every possible state. After I restore the trained agent, calling `compute_log_likelihoods`

on its policy over the whole action space gives me something resembling a uniform distribution, which is unexpected since an optimal policy in my case should be deterministic. Indeed, when I run the restored agent, it performs well and is clearly running something close to an optimal policy. So the output of `compute_log_likelihoods`

clearly doesn’t describe what I think it does. What does it describe and how do I compute the probability of an action?

Here is the all code I’m running:

Many thanks!