Confused by output of `compute_log_likelihoods`

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

tl;dr: the action probabilities returned by compute_log_likelihoods do not resemble the actual behavior of the agent when sampling actions.

Long: I’m trying to extract a tabular policy from a trained DQN agent by querying the action probabilities at every possible state. After I restore the trained agent, calling compute_log_likelihoods on its policy over the whole action space gives me something resembling a uniform distribution, which is unexpected since an optimal policy in my case should be deterministic. Indeed, when I run the restored agent, it performs well and is clearly running something close to an optimal policy. So the output of compute_log_likelihoods clearly doesn’t describe what I think it does. What does it describe and how do I compute the probability of an action?

Here is the all code I’m running:

Many thanks!