Offline RL evaluation

dhajinezhad · March 3, 2023, 12:12am

I am trying to run offline RL to train cartpole-v0 using the data set I already stored in the disk. But, during the training process I get ‘nan’ for episode rewards. Here is my code:
from ray.rllib.algorithms.dqn import DQNConfig
from ray.rllib.offline.estimators import (
ImportanceSampling,
WeightedImportanceSampling,
DirectMethod,
DoublyRobust,
)
from ray.rllib.offline.estimators.fqe_torch_model import FQETorchModel

config = (
DQNConfig()
.environment(env=“CartPole-v0”)
.framework(“torch”)
.offline_data(input_=“cartpole-train”)
#.metrics_smoothing_episodes()
.evaluation(
evaluation_interval=1,
evaluation_duration=10,
evaluation_num_workers=1,
evaluation_duration_unit=“episodes”,
evaluation_config={“input”: “cartpole-eval”},
off_policy_estimation_methods={
“is”: {“type”: ImportanceSampling},
“wis”: {“type”: WeightedImportanceSampling},
“dm_fqe”: {
“type”: DirectMethod,
“q_model_config”: {“type”: FQETorchModel, “polyak_coef”: 0.05},
},
“dr_fqe”: {
“type”: DoublyRobust,
“q_model_config”: {“type”: FQETorchModel, “polyak_coef”: 0.05},
},
},
)
)

algo = config.build()
for _ in range(10):
print(algo.train())

Rohan138 · April 17, 2023, 5:19pm

This is as expected since there is no online evaluation taking place with an actual environment. The offline evaluation results are under the following keys → evaluation: {off_policy_estimator: {“is”: {…}, “wis”: {…}, …}}

See: Working With Offline Data — Ray 2.3.1

Topic		Replies	Views
Offline data example Offline RL	4	662	April 14, 2023
Offline RL; incompatible dimensions RLlib	9	568	October 25, 2022
Offline data and off-policy estimation RLlib	4	710	July 20, 2022
`rllib rollout` command seems to be training the network, not evaluating RLlib	3	742	January 22, 2021
"Working with offlien data" tutorial: .read_parquet loads parquet with observations as strings Offline RL	0	18	February 23, 2025

Offline RL evaluation

Related topics