I’m evaluating a DQN agent using the Ray/RLlib rollout and I compare its behavior to that of a manually stepped model.
I also managed to save the
tf.keras.Model object as an
h5 file and I could step it manually. The inputs are the same but the Q-values (and hence the actions) are different. For training, I use
tf and for the manual rollout, I have
import tensorflow as tf instead of
_, tf, _ = try_import_tf()). I also tried to use
tf2 during training but this does not solve the issue: the outputs are still different.
It is worth mentioning that
- the actions from the RLlib rollout vs my manual rollout are quite close to each other, so I haven’t ruled out the precision error option.
- I’ve been careful enough to take the dueling behavior into account. I use
ray.rllib.agents.dqn.dqn_tf_policy.compute_q_values()to compute the Q-values in my model (i.e using the
state_scoremodel on top of
- I have a custom model that subclasses
__init__(). The flow is:
inputs→ custom embedding [also called
model_out] → (
state_out). Then I use something similar to
real_q_values = custom_q_values_fn(q_out, state_out, model_out). This has the expected shape
- I save the object
- I take the argmax of the final Q-values (i.e to reproduce
Has anyone seen a similar problem?