Offline RL evaluation

I am trying to run offline RL to train cartpole-v0 using the data set I already stored in the disk. But, during the training process I get ‘nan’ for episode rewards. Here is my code:
from ray.rllib.algorithms.dqn import DQNConfig
from ray.rllib.offline.estimators import (
ImportanceSampling,
WeightedImportanceSampling,
DirectMethod,
DoublyRobust,
)
from ray.rllib.offline.estimators.fqe_torch_model import FQETorchModel

config = (
DQNConfig()
.environment(env=“CartPole-v0”)
.framework(“torch”)
.offline_data(input_=“cartpole-train”)
#.metrics_smoothing_episodes()
.evaluation(
evaluation_interval=1,
evaluation_duration=10,
evaluation_num_workers=1,
evaluation_duration_unit=“episodes”,
evaluation_config={“input”: “cartpole-eval”},
off_policy_estimation_methods={
“is”: {“type”: ImportanceSampling},
“wis”: {“type”: WeightedImportanceSampling},
“dm_fqe”: {
“type”: DirectMethod,
“q_model_config”: {“type”: FQETorchModel, “polyak_coef”: 0.05},
},
“dr_fqe”: {
“type”: DoublyRobust,
“q_model_config”: {“type”: FQETorchModel, “polyak_coef”: 0.05},
},
},
)
)

algo = config.build()
for _ in range(10):
print(algo.train())

This is as expected since there is no online evaluation taking place with an actual environment. The offline evaluation results are under the following keys → evaluation: {off_policy_estimator: {“is”: {…}, “wis”: {…}, …}}

See: Working With Offline Data — Ray 2.3.1