Hi,
I figured out that i want to override the Policy Object for the DQN Algorithm class in DQN.get_default_policy_class
. Also traced back the src code to the DQNTorchPolicy
constructor, but i am a bit lost as to where exactly the DQNTorchPolicy.compute_actions
method resides. I traced it back to the ray.rllib.policy.policy_template.build_policy_class
that has a class definition policy_cls
, but does not explicitly yet define the compute_actions
of the DQN. I only want to do a minimal change to its source; ensuring that i have the exact same DQN implementation as before and compare against it immediately.
I should probably add, that it is important, that the trajectory i track is that of the policy (and not of the epsilon steps) – meaning i only ever want to see how the policy unrolls greedily, so i can see the learning state of the network in the form of the unrolled trajectory.
I also have a related side-question - how exactly does the epsilon scheduler work? - because i probably will need to deactivate/replace it as well.
Thanks!