Hello,
I’d like to initialize the replay buffer in SAC (or any other off-policy algo) with experience from a non-RL policy. I want the RL agent to start out learning from this policy. What is the best way to do this?
Hello,
I’d like to initialize the replay buffer in SAC (or any other off-policy algo) with experience from a non-RL policy. I want the RL agent to start out learning from this policy. What is the best way to do this?
I did something similar for offline RL agent CQL: ray/cql.py at master · ray-project/ray · GitHub
They key part is to add the dataset in the after_init
function in the Trainer template.