Writing custom RLModule for custom Algorithm

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi, I’m implementing my custom RL algorithm and for neural net part I should sub-class Policy or RLModule class. Documentation suggests to use RLModule as it’s newer but my code keeps calling get_default_policy_class method. Also noticed PPO algorithm when set to use rl_module_api=True also makes calls to Policy class. So my question is: Is it possible to write code only implementing RLModule class or should I still use Policy class method ? Why is Policy classes still used in Algorithm.setup() when config says to use rl_module_api ?

Hi @zygis, and welcome to the forum!

Great question! So, the RLModule is a part of our new stack and should replace the ModelV2 (not the Policy). The RLModule needs only the Policy if the algorithm samples with the RolloutWorker. In our new staack we are going to replace this latter class with the EnvRunner which will not need a Policy anymore. However, until the new stack is complete and fully tested you still need to subclass the TFPolicy or TorchPolicy to use the RLModule. Take a look into the PPO algorithm to get an understanding how to to subclass the Policy when using the RLModule, i.e. what methods to implement and take a look at PPO.get_default_policy_class()

1 Like