Custom rollout and training loop

Hi RLlib community,

Thanks for developing and supporting this great library.
I want to manually do rollouts and training instead of calling trainer.train(). For instance, something like this:

# For 1000 iterations:
for iter in range(1000)
    obs = env.reset()
    done = False

    while not done:
        action = policy.compute_action(obs)
        new_obs, rew, done, info = env.step(action)

        # Save rollout data
        replay_buffer.add(obs, action, rew, new_obs, done)
    
        obs = new_obs

    # Train the policy with the rollout data, 
    # and log pg_loss, entropy and so on
    trainer.train(policy, replay_buffer)

I found this link and SampleBatchBuilder but it doesn’t have any training in the script.

I know I’ll lose parallel training benefits with this, but it doesn’t matter. The fine-grained control of manual rollouts and training is more important to me. I can’t find any information about how to do this though. I would appreciate any help!!!

Also, how can I change hyperparameters mid-training? For example, suppose in a training session with 8000 epochs, I decide at epoch 1000 that I need to change the entropy coefficient. How can I change it? Is it straightforward?

Hi @pouyahmdn ,

you say you want to do training manually, but in your pseudocode your last line reads trainer.train(). If you want to implement training manually you need actually a loss function, model(s), and optimizer. It is still unclear from your pseudocode, what you want to do.

My suggestion is to make clear if you want to only run the steps manually, but still use RLlib for training or, if you want to also provide the training algorithm.

What SampleBatchBuilder does is using the JsonWriter class to write sample batches into JSON format for later (offline) training. You can then use a trainer and feed the written sample batches into a trainer.

Hi @Lars_Simon_Zehnder ,

By train() in the pseudocode, I mean a single iteration of some RL algorithm (like DQN or A2C) with the given manually rolled out batch data. So the loss function should already be defined and handled in that train() function. The difference is that I manually roll out batches and give it to the trainer, instead of trainer doing all that in the background.

@pouyahmdn ,

I am afraid, but I think that this does not work this way (sampling a single episode and then training a single step using a trainer). The reason is that each trainer comes along with an execution plan (see for example dqn.py. In this execution plan it is iterated between sampling and training. So sampling is happening inside of the trainer no matter what.

What you can do is write with the SampleBatchBuilder your sampled episodes out and then use the RLlib Input API to make actually offline training.

Hope this helps

1 Like