Help understand the output from compute_actions()

dailizhang · February 14, 2023, 1:35pm

The action has 3 element. What does action_dist_inputs mean?
Thought it is [mean, std, low, high], but that doesn’t match the dimension.
Can someone help me? Thanks.

mannyv · February 14, 2023, 7:06pm

action_prob: The probability of the action that is selected. In a categorical distribution this would be the softmax value for that action.

action_logp: The log of the action_prob

action_dist_inputs: The logits coming from the last layer of the model. These are used as inout the the action distribution.

vf_preds:The value function estimate of the input state

dailizhang · February 15, 2023, 9:03pm

Thanks a lot.
I am focusing on the continuous actions, such as using SAC. What do the values mean in the action_dist_inputs for continuous actions?

When I called the agent.compute_single_action() with unsquash_action=True, the actions are not normalized. However, when I call the agent.get_policy().model().compute_action() with unsquash_action=True, the actions are normalized.
How can I get the normalized actions back to non-normalized in batch?

dailizhang · February 15, 2023, 10:15pm

I checked into the source code, for TorchSquashedGaussian, looks like it is [mean, log-std],
based on the functions _squash() and _unsquash(), I am able to recover the squashed to unsquashed actions values.

Topic		Replies	Views
How are action computed from action_dist_inputs? RLlib	2	328	December 12, 2023
How do you get action probabilities from a policy? RLlib	8	1747	September 22, 2022
Output of PPO with discrete actions RLlib	4	1091	December 15, 2022
Rllib is auto adjusting my action distribution RLlib	4	316	May 26, 2022
How to get DQN action distribution RLlib	2	385	November 3, 2022

Help understand the output from compute_actions()

Related topics