Hi,
I was training an Apex agent [with noisy, n_steps and num_atoms, with all others being default] on a custom environment.
My issue is that when I tried predicting using the trained agent with compute_single_action, the q values I get is different with each iteration on the same input.
What am I missing?