@Lars_Simon_Zehnder Yes, thank you. I found a solution in another thread. I wish the docs for converting external experiences to batch format would have been a little bit more explicit that envs are not completely necessary and maybe provided an example like the one above in the thread.
My only question after that would be, how do I specify the offline data for both the input training and evaluation in the config?
config = (
DQNConfig()
.framework(“tf2”)
.offline_data(input_config={
“paths”: [“/root/DRL/reward1/0/train/output-2023-09-10_19-16-56_worker-0_0”],
“format”: “json”,
“input”:‘dataset’,
“explore”:False
},
)
.environment(observation_space = Dict({
‘obs’: Box(low = -10000, high = 100000, shape=(32,), dtype = np.float32)
}),
action_space = Discrete(2) )
.debugging(log_level=“INFO”)
.evaluation(
evaluation_interval=1,
evaluation_duration=10,
evaluation_num_workers=1,
evaluation_duration_unit=“episodes”,
evaluation_config={“paths”: [“/root/DRL/reward1/0/test/output-2023-09-10_19-16-56_worker-0_0”],
“format”: “json”,
“explore”: False,
“input”:‘dataset’},
off_policy_estimation_methods={
“is”: {“type”: ImportanceSampling},
“wis”: {“type”: WeightedImportanceSampling},
}
)
)
I ask this because the off-policy evaluation methods need the evaluation data to be a dataset, but this method provides it as a sampler input.