I am working with offline RL, specifically CQL. I have trained a policy offline using my prefered data. The policy is stared in a checkpoint and that I would like to restore. Since I want to evaluate my policy online on my environment I make some small adjustments to the configuration. The code looks something like this:
config['env'] = my_env config['input'] = 'sampler' trainer = CQLTrainer(config=config) trainer.restore(checkpoint_path)
Running this I get the error:
ValueError: Unknown offline input! config['input'] must either be list of offline files (json) or a D4RL-specific InputReader specifier (e.g. 'd4rl.hopper-medium-v0').
This does not make sense to me. How can I evaluate a policy created by CQL online if it cannot use the sampler input?
I am using ray 1.4.0