Hi @zygis, and welcome to the forum!
Great question! So, the RLModule
is a part of our new stack and should replace the ModelV2
(not the Policy
). The RLModule
needs only the Policy
if the algorithm samples with the RolloutWorker
. In our new staack we are going to replace this latter class with the EnvRunner
which will not need a Policy
anymore. However, until the new stack is complete and fully tested you still need to subclass the TFPolicy
or TorchPolicy
to use the RLModule
. Take a look into the PPO
algorithm to get an understanding how to to subclass the Policy
when using the RLModule
, i.e. what methods to implement and take a look at PPO.get_default_policy_class()