Difference in predicted q values between runs

Hi,

I was training an Apex agent [with noisy, n_steps and num_atoms, with all others being default] on a custom environment.
My issue is that when I tried predicting using the trained agent with compute_single_action, the q values I get is different with each iteration on the same input.

What am I missing?

@vishnukmd7 welcome to the forums.

Check out information on exploration

https://docs.ray.io/en/latest/rllib-training.html#customizing-exploration-behavior

Hi,

I am still not able to figure it out. In the trainer config, I have set explore to False.

I am trying to get the q values by using actor.get_policy().compute_single_action(observation,explore=None), and each time, the values are different.