Yes, masking the forward works but I’m not using dueling dqn. Basically, DistributionalQTFModel
does input
→ forward()
→ model_out
. Now, model_out
contains the masked Q values, as I want.
But if q_hiddens
is specified, or use_noisy is True
, other layers will be added on top of model_out
which I guess will break the model, since they will process the masked Q values and produce new values (also, I guess that layers taking tf.float32.min
values as input will behave very badly)