DQNTorchPolicy; Custom Policy


I figured out that i want to override the Policy Object for the DQN Algorithm class in DQN.get_default_policy_class. Also traced back the src code to the DQNTorchPolicy constructor, but i am a bit lost as to where exactly the DQNTorchPolicy.compute_actions method resides. I traced it back to the ray.rllib.policy.policy_template.build_policy_class that has a class definition policy_cls, but does not explicitly yet define the compute_actions of the DQN. I only want to do a minimal change to its source; ensuring that i have the exact same DQN implementation as before and compare against it immediately.

I should probably add, that it is important, that the trajectory i track is that of the policy (and not of the epsilon steps) – meaning i only ever want to see how the policy unrolls greedily, so i can see the learning state of the network in the form of the unrolled trajectory.

I also have a related side-question - how exactly does the epsilon scheduler work? - because i probably will need to deactivate/replace it as well.