DQNTorchPolicy; Custom Policy

Tim_Ruhkopf · March 1, 2024, 9:24am

Hi,

I figured out that i want to override the Policy Object for the DQN Algorithm class in DQN.get_default_policy_class. Also traced back the src code to the DQNTorchPolicy constructor, but i am a bit lost as to where exactly the DQNTorchPolicy.compute_actions method resides. I traced it back to the ray.rllib.policy.policy_template.build_policy_class that has a class definition policy_cls, but does not explicitly yet define the compute_actions of the DQN. I only want to do a minimal change to its source; ensuring that i have the exact same DQN implementation as before and compare against it immediately.

I should probably add, that it is important, that the trajectory i track is that of the policy (and not of the epsilon steps) – meaning i only ever want to see how the policy unrolls greedily, so i can see the learning state of the network in the form of the unrolled trajectory.

I also have a related side-question - how exactly does the epsilon scheduler work? - because i probably will need to deactivate/replace it as well.

Thanks!

Topic		Replies	Views
Why does DQN can have custom function? RLlib	1	251	January 9, 2023
.compute_actions() for multi agent environment Configure Algorithm, Training, Evaluation, Scaling	1	247	July 15, 2024
Score the trained policy by ray RLlib	2	310	June 25, 2021
Customize DQN policy in two-trainer multiagent example RLlib	4	388	September 20, 2022
Compute_actions for Trajectory API RLlib	11	2421	February 10, 2022

DQNTorchPolicy; Custom Policy

Related topics