Initialize replay buffer


I’d like to initialize the replay buffer in SAC (or any other off-policy algo) with experience from a non-RL policy. I want the RL agent to start out learning from this policy. What is the best way to do this?

I did something similar for offline RL agent CQL: ray/ at master · ray-project/ray · GitHub

They key part is to add the dataset in the after_init function in the Trainer template.

1 Like