How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
tl;dr: the action probabilities returned by
compute_log_likelihoods do not resemble the actual behavior of the agent when sampling actions.
Long: I’m trying to extract a tabular policy from a trained DQN agent by querying the action probabilities at every possible state. After I restore the trained agent, calling
compute_log_likelihoods on its policy over the whole action space gives me something resembling a uniform distribution, which is unexpected since an optimal policy in my case should be deterministic. Indeed, when I run the restored agent, it performs well and is clearly running something close to an optimal policy. So the output of
compute_log_likelihoods clearly doesn’t describe what I think it does. What does it describe and how do I compute the probability of an action?
Here is the all code I’m running: