Hi RLlib community,
Thanks for developing and supporting this great library.
I want to manually do rollouts and training instead of calling
trainer.train(). For instance, something like this:
# For 1000 iterations: for iter in range(1000) obs = env.reset() done = False while not done: action = policy.compute_action(obs) new_obs, rew, done, info = env.step(action) # Save rollout data replay_buffer.add(obs, action, rew, new_obs, done) obs = new_obs # Train the policy with the rollout data, # and log pg_loss, entropy and so on trainer.train(policy, replay_buffer)
I found this link and
SampleBatchBuilder but it doesn’t have any training in the script.
I know I’ll lose parallel training benefits with this, but it doesn’t matter. The fine-grained control of manual rollouts and training is more important to me. I can’t find any information about how to do this though. I would appreciate any help!!!
Also, how can I change hyperparameters mid-training? For example, suppose in a training session with 8000 epochs, I decide at epoch 1000 that I need to change the entropy coefficient. How can I change it? Is it straightforward?