Confused by output of `compute_log_likelihoods`

jwarley · March 28, 2022, 7:03am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

tl;dr: the action probabilities returned by compute_log_likelihoods do not resemble the actual behavior of the agent when sampling actions.

Long: I’m trying to extract a tabular policy from a trained DQN agent by querying the action probabilities at every possible state. After I restore the trained agent, calling compute_log_likelihoods on its policy over the whole action space gives me something resembling a uniform distribution, which is unexpected since an optimal policy in my case should be deterministic. Indeed, when I run the restored agent, it performs well and is clearly running something close to an optimal policy. So the output of compute_log_likelihoods clearly doesn’t describe what I think it does. What does it describe and how do I compute the probability of an action?

Here is the all code I’m running:

gist.github.com

https://gist.github.com/jwarley/3f7501ccc4a86e6ee9ffa5ad71ecc995

gistfile1.txt

#####################################################################
             I run this script to train the agent...
#####################################################################

import ray
import gym
import numpy as np
from gym.spaces import Discrete, Box
from ray import tune
from ray.tune.registry import register_env

This file has been truncated. show original

Many thanks!

Topic		Replies	Views
How do you get action probabilities from a policy? RLlib	8	1779	September 22, 2022
Policy.compute_log_likelihoods should allows to compute with/without applying the exploration (e.g. SoftQ exploration) RLlib	1	271	April 16, 2021
Score the trained policy by ray RLlib	2	316	June 25, 2021
Fetch action probability distribution from trained policy RLlib	7	676	March 18, 2023
Next action in RLlib VisionNetworks RLlib	4	499	April 27, 2021

Confused by output of `compute_log_likelihoods`

Related topics