Difference in predicted q values between runs

vishnukmd7 · February 21, 2022, 4:34pm

Hi,

I was training an Apex agent [with noisy, n_steps and num_atoms, with all others being default] on a custom environment.
My issue is that when I tried predicting using the trained agent with compute_single_action, the q values I get is different with each iteration on the same input.

What am I missing?

mannyv · February 21, 2022, 8:35pm

@vishnukmd7 welcome to the forums.

Check out information on exploration

https://docs.ray.io/en/latest/rllib-training.html#customizing-exploration-behavior

vishnukmd7 · February 22, 2022, 3:51pm

Hi,

I am still not able to figure it out. In the trainer config, I have set explore to False.

I am trying to get the q values by using actor.get_policy().compute_single_action(observation,explore=None), and each time, the values are different.

Topic		Replies	Views
Inconsistent actions from Algorithm.compute_single_action RLlib	3	231	June 14, 2023
SAC inference action distribution much different than during training RLlib	2	304	March 10, 2022
Policy.compute_single_action() wrong outputs RLlib	0	117	October 30, 2023
Using different get_exploration_action logic pre and post training RLlib	1	356	November 11, 2022
Non acting agents in APPO RLlib	2	226	January 26, 2022

Difference in predicted q values between runs

Related Topics