Doubly Robust off-policy estimation method

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

RLLib supports importance sampling and weighted importance sampling to compute off-policy estimates. Will RLLib also support the Doubly Robust off-policy estimation method in a future release?


Hey @steff ,
we have “implementing some SOTA off-policy estimators” on our short-TODO-list (including double-robust OPE). We have already “opened up” the off policy estimator API in a past PR, so one can now configure their own custom estimators:

  input_evaluation: [MyOwnPolicyEstimatorClass]

# With: 
from ray.rllib.offline.off_policy_estimator import OffPolicyEstimator
class MyOwnPolicyEstimatorClass(OffPolicyEstimator):
    def estimate() ...
    def process() ...

Hi Sven,

Thanks for your reply.

Do you have an approximate date for when the doubly robust OPE will be released?

Thanks for sharing the API to create a custom OPE.