Offline data example

asdfg · March 10, 2023, 4:00pm

Hello,

Is there a complete example of how to use Offline data using Tensorflow? The example shown below only seems to work with PyTorch. Also are there any examples using any of the offline RL algorithms such as BC or CRR?

from ray.rllib.algorithms.dqn import DQNConfig
from ray.rllib.offline.estimators import (
    ImportanceSampling,
    WeightedImportanceSampling,
    DirectMethod,
    DoublyRobust,
)
from ray.rllib.offline.estimators.fqe_torch_model import FQETorchModel

config = (
    DQNConfig()
    .environment(env="CartPole-v1")
    .framework("torch")
    .offline_data(input_="/tmp/cartpole-out")
    .evaluation(
        evaluation_interval=1,
        evaluation_duration=10,
        evaluation_num_workers=1,
        evaluation_duration_unit="episodes",
        evaluation_config={"input": "/tmp/cartpole-eval"},
        off_policy_estimation_methods={
            "is": {"type": ImportanceSampling},
            "wis": {"type": WeightedImportanceSampling},
            "dm_fqe": {
                "type": DirectMethod,
                "q_model_config": {"type": FQETorchModel, "polyak_coef": 0.05},
            },
            "dr_fqe": {
                "type": DoublyRobust,
                "q_model_config": {"type": FQETorchModel, "polyak_coef": 0.05},
            },
        },
    )
)

algo = config.build()
for _ in range(100):
    algo.train()

Thanks!

arturn · April 13, 2023, 11:29pm

Hi @asdfg,

The follwing code is working for me:

from ray.rllib.algorithms.dqn import DQNConfig
from ray.rllib.offline.estimators import (
    ImportanceSampling,
    WeightedImportanceSampling,
    DirectMethod,
    DoublyRobust,
)
from ray.rllib.offline.estimators.fqe_torch_model import FQETorchModel

config = (
    DQNConfig()
    .environment(env="CartPole-v1")
    .framework("torch")
    .offline_data(input_=<path>)
    .evaluation(
        evaluation_interval=1,
        evaluation_duration=10,
        evaluation_num_workers=1,
        evaluation_duration_unit="episodes",
        off_policy_estimation_methods={
            "is": {"type": ImportanceSampling},
            "wis": {"type": WeightedImportanceSampling},
            "dm_fqe": {
                "type": DirectMethod,
                "q_model_config": {"type": FQETorchModel, "polyak_coef": 0.05},
            },
            "dr_fqe": {
                "type": DoublyRobust,
                "q_model_config": {"type": FQETorchModel, "polyak_coef": 0.05},
            },
        },
    )
)

algo = config.build()
for _ in range(100):
    algo.train()

What version are you on?
Please try with the latest release.

asdfg · April 14, 2023, 1:35am

Hello,

This does not seem to work with TF for me.

arturn · April 14, 2023, 9:47pm

@asdfg . Got it. We are not testing this with tf because it’s not supported.
Thanks for reporting this. We are not supporting to support this for tf for now.
I’m opening a PR that throws a more informative error here.

arturn · April 14, 2023, 9:54pm

Topic		Replies	Views
Offline RL evaluation Configure Algorithm, Training, Evaluation, Scaling	1	396	April 17, 2023
Offline RL with DQN, PPO, etc Offline RL	0	322	November 5, 2023
Offline tutorial : TypeError: must be Tensor, not numpy.ndarray RLlib	6	57	February 12, 2025
Doubly Robust off-policy estimation method RLlib	6	455	August 3, 2022
Offline data and off-policy estimation RLlib	4	715	July 20, 2022

Offline data example

Related topics