I was wondering what is the easiest way to train an off-policy algorithm (e.g. SAC) using the sample batches collected by a RolloutWorker.sample()?
for n= num_iters:
samples = worker.sample()
use the samples to train a SAC policy ?
get the critic network
End for