What is the cleanest way to train an off-policy algorithm (e.g. SAC) using the sample batches collected by a RolloutWorker.sample()?

Terpragon · February 7, 2023, 11:00pm

I was wondering what is the easiest way to train an off-policy algorithm (e.g. SAC) using the sample batches collected by a RolloutWorker.sample()?

for n= num_iters:
samples = worker.sample()
use the samples to train a SAC policy ?
get the critic network
End for

arturn · April 19, 2023, 6:47pm

All RLlib algorithms sample by default already.
Have a look at SAC.
It inherits from DQN, which has a training_step function that does what you describe.

Topic		Replies	Views
Understanding SAC: Data Collection and Training RLlib	0	594	August 24, 2023
How to train a SAC agent with the offline API? RLlib	1	321	April 26, 2022
Parallelizing rollout sampling and learning for SAC Configure Algorithm, Training, Evaluation, Scaling	0	22	June 7, 2025
What is the recommended way to make use of a trained model? RLlib	2	366	February 8, 2022
Custom rollout and training loop RLlib	4	724	April 26, 2023

What is the cleanest way to train an off-policy algorithm (e.g. SAC) using the sample batches collected by a RolloutWorker.sample()?

Related topics