How do you get action probabilities from a policy?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Given a state and a trained policy, how can I compute the action distribution for that state under the policy? Looking at the docs, it sounds to me like that’s what the compute_log_likelihoods function is for, but it doesn’t behave as expected and when I asked about that function specifically I got no answers.

I’ve also looked at the example Querying a policy’s action distribution, but my results using this method aren’t making sense either. In the terminology of that example, I would expect

np.sum([np.e ** dist.logp(a) for a in actions])

to equal the size of the state space |S|, but instead I’m getting a much smaller number, so these logps must not mean what I think, i.e. they are not log(P(action|state)). The example is outdated (from_batch is deprecated, for instance), so I had to make some changes; maybe I’m getting the wrong distribution somehow?

Why is this so hard to find clear/consistent documentation on? Computing an action distribution is one of the simplest things you could possibly want to do with an RL agent. I’d be happy to submit a PR with better docs on this if someone can explain what the intended solution is.