Hey @rfali , yeah, this should work with a custom Q-model (just sub-class the DQNTFModel and implement this logic). You’d also probably have to change the DQN loss function, though.
Just create a new DQNTFPolicy via:
MyDQNPolicy = DQNTFPolicy.with_updates(loss=[your own loss function]).
Gamma from the config is only used in the loss function and maybe the n-step postprocessing function, so you’d have to do either n_step=1 or also implement your own postprocessing function, like so:
MyDQNPolicy = DQNTFPolicy.with_updates(loss=[your own loss function], postprocess_fn=[your own postprocessing fn doing n-step with different gammas]).
For action choices, you also may have to specify your own action_distribution_fn
:
MyDQNPolicy = DQNTFPolicy.with_updates(loss=[your own loss function], postprocess_fn=[your own postprocessing fn doing n-step with different gammas], action_distribution_fn=[your own action picking and action distribution fn]).
Here is the docstring:
action_distribution_fn (Optional[Callable[[Policy, ModelV2, TensorType,
TensorType, TensorType],
Tuple[TensorType, type, List[TensorType]]]]): Optional callable
returning distribution inputs (parameters), a dist-class to
generate an action distribution object from, and internal-state
outputs (or an empty list if not applicable). If None, will either
use `action_sampler_fn` or compute actions by calling self.model,
then sampling from the so parameterized action distribution.