Inconsistent actions from Algorithm.compute_single_action

doug57 · June 14, 2023, 2:43am

How severe does this issue affect your experience of using Ray?

High: It blocks me from completing my task.

I have a custom environment. After running PPO training, I am computing actions from observations. The observation space is eight values. The action space is one of four values - 0, 1, 2, or 3.

When I reset my environment, I always get the same observation values. However, I get different actions if I call compute_single_action(obs) from the observations after resetting the environment.

I would expect that for equal observation values input, I would get the same output action every time the compute_single_action(obs) function is called.

Any help understanding why the output from compute_single_action(obs) changes from one function invocation to the next, with equal input, would be appreciated.

doug57 · June 14, 2023, 6:26pm

I have repeated the analysis using the CartPole-v1 environment and have the same concern. If I take an observation and repeatedly execute compute_single_action(observation), i.e.,repeatedly call the function with that observation, I get different actions. For instance, in one test, I got six 0 actions and four 1 actions.

Have I mistaken the function compute_single_action(observation) as the policy function in RL? If so, what should I use to access the policy function?

mannyv · June 14, 2023, 7:55pm

Hi @doug57,

This is likely expected behavior. Check out the exploration setting here:
https://docs.ray.io/en/latest/rllib/rllib-training.html#specifying-exploration-options

Also take note of this quote from the documentation:
“IMPORTANT NOTE: Policy gradient algorithms are able to find the optimal policy, even if this is a stochastic one. Setting “explore=False” here will result in the evaluation workers not using this optimal policy!”

doug57 · June 14, 2023, 9:15pm

Thank you for your response. I appreciate you confirming that this is the expected behavior. I will look into using “explore=False” during inference.

Topic		Replies	Views
Compute_single_action with explore=false returns the same result RLlib	2	95	August 20, 2024
Policy.compute_single_action() wrong outputs RLlib	0	226	October 30, 2023
Score the trained policy by ray RLlib	2	310	June 25, 2021
Compute actions Programmatically RLlib	1	285	February 5, 2022
Compute_single_action(obs, state) of policy and algo: different performance Checkpointing, Restoring	1	739	April 13, 2023

Inconsistent actions from Algorithm.compute_single_action

Related topics