Writing custom RLModule for custom Algorithm

Hi @zygis, and welcome to the forum!

Great question! So, the RLModule is a part of our new stack and should replace the ModelV2 (not the Policy). The RLModule needs only the Policy if the algorithm samples with the RolloutWorker. In our new staack we are going to replace this latter class with the EnvRunner which will not need a Policy anymore. However, until the new stack is complete and fully tested you still need to subclass the TFPolicy or TorchPolicy to use the RLModule. Take a look into the PPO algorithm to get an understanding how to to subclass the Policy when using the RLModule, i.e. what methods to implement and take a look at PPO.get_default_policy_class()

1 Like