I use the RLlib (Ray 2.6.3), especially the PPO for my task. I have a question regarding the configuration of the PPO, which is still not clear to me.
Is there a connection between these two training variables: “num_rollout_workers” and the “train_batch_size”? For example, when I have two “num_rollout_workers”, do I have to multiply the “train_batch_size” with the number of “num_rollout_workers” in the configuration?
Many thanks for your support in advance!
Yes, there is a relationship between
train_batch_size in the configuration of PPO in RLlib.
num_rollout_workers parameter specifies the number of workers that are used for environment sampling. Each of these workers collects samples from the environment in parallel, which can significantly speed up the data collection process.
On the other hand,
train_batch_size is the number of samples collected by all rollout workers combined that the algorithm will use for each training iteration.
So, if you have
num_rollout_workers=2, it doesn’t mean you have to multiply the
train_batch_size by 2. However, you should ensure that
train_batch_size is large enough to accommodate the samples collected by all the workers.
In other words,
train_batch_size should be greater than or equal to
num_rollout_workers * rollout_fragment_length * num_envs_per_worker. This is because each worker collects
rollout_fragment_length * num_envs_per_worker samples before sending them to the learner.
Here’s an example from a forum post:
num_gpus = 0
num_gpus_per_worker = 0
num_cpus_for_local_worker = 1
num_cpus_per_worker = 1
num_rollout_workers = 1
rollout_fragment_length = 200
train_batch_size = 200 #must be = rollout_fragment_length * num_rollout_workers * num_envs_per_worker
sgc_minibatch_size = 32
In this example,
train_batch_size is set to
200, which is equal to
rollout_fragment_length * num_rollout_workers * num_envs_per_worker.
train_batch_size is a hyperparameter that you can tune based on your specific problem and computational resources. It doesn’t have to be exactly equal to
num_rollout_workers * rollout_fragment_length * num_envs_per_worker, but it should be large enough to accommodate the samples collected by all the workers.