[RLlib] Batch size for complete_episodes issue

tibogiss · May 3, 2021, 7:30am

Hi everyone,

Running the simple script below, where my goal was to check the size of training batches, I realised that things did not behave as I expected. Here I have 1 worker, an environment that always has 10 steps, a batch size of 10 for training (for simplicity I also put 1 sgt iteration and a minibatch of size 10). When printing the actual size of the batch here (len(batch)) I have a a size of 200.

It is worth noting that if I let "batch_mode": "complete_episodes" and put ("rollout_fragment_length": 10), the size is right as if the argument was still considered.

batch_size = 10
class TestEnv(gym.Env):
    def __init__(self, config=None):
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Box(low=-1, high=1, shape=(2,))
        self.s = 0
    def step(self, action):
        self.s += 1
        return np.random.rand(2), 1, self.s >= batch_size, {}
    def reset(self):
        self.s = 0
        return np.random.rand(2)
ray.init(num_cpus=1, local_mode=True)
config = dict(
    **{
        "env": TestEnv,
        "num_gpus": 0,
        "framework": "tfe",
        "num_workers": 1,
        "batch_mode": "complete_episodes",
        "train_batch_size": batch_size,
        "sgd_minibatch_size": batch_size,
        "num_sgd_iter": 1,
    }
)
tune.run("PPO", config=config)

Thanks for you help

sven1977 · May 3, 2021, 7:38am

Hey @tibogiss , thanks for the post

Hmm, yeah, “train_batch_size” is not entirely respected by RLlib here b/c the “rollout_fragment_length” is 200 (default value). So the 1 worker always collects at least 200 steps (and maybe even more if “batch_mode=complete_episode” and it has to finish an episode) from the env before sending the batch back to the driver (for learning).

2 Options here:

I agree it is confusing and we may want to change that into: Collect batch on worker(s) (according to rollout_fragment_length) → split the concated batch (from n workers) into train_batch_size chunks and do n training calls. However, this would have the same effect as the current logic as the worker policies are not updated in between the n training calls (they have “over collected” so to speak).
I think the better solution would be to warn and automatically auto-adjust the rollout_fragment_length in this case: rollout_fragment_length = train_batch_size / num_workers. Something like this.

I agree that if a users sets train_batch_size to something, we would expect that size to arrive in Policy.learn_on_batch

sven1977 · May 3, 2021, 9:05am

Let’s try this here. Pending tests passing. This should alleviate the confusion about the batch size arriving in Policy.learn_on_batch being different from the user defined train_batch_size.
Note: This could still be the case iff batch_mode=complete_episodes, which is unavoidable for episodes taking longer than rollout_fragment_length.

github.com/ray-project/ray

[RLlib] Discussion 2022: PPO should auto-adjust `rollout_fragment_length` is other settings do not align with `train_batch_size`.

ray-project:master ← sven1977:discussion_2022_auto_adjust_rollout_frag_len_if_train_batch_size_mismatch

opened 09:05AM - 03 May 21 UTC

sven1977

+20 -0

To avoid confusion about batch sizes being very different in `Policy.learn_on_…batch` from the given `train_batch_size`, PPO should auto-adjust `rollout_fragment_length` is other settings do not align with `train_batch_size`. Also see this discussion here: https://discuss.ray.io/t/batch-size-for-complete-episodes-issue/2022/2 ## Why are these changes needed? ## Related issue number ## Checks - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

2dm · January 21, 2022, 8:52pm

Hi @sven1977 ,

I am a bit confused about the need to warn and adjust rollout_fragment_length when batch_mode=complete_episode. In the documentation it is stated that for complete_episode "
The rollout_fragment_length setting will be ignored", so getting a warning about this parameter would make me believe my configuration has something wrong with it (i.e. it doesn’t run complete episodes). Would you agree or do I misunderstand the whole slicing flow?

Sorry for reviving this old post. I hope it’s still relevant.

sven1977 · January 26, 2022, 11:58am

Hey @2dm , thanks for the catch. It’s not entirely correct that the rollout_fragment_length is ignored in case batch_mode="complete_episodes". I’ll fix that right away in the docs.

If batch_mode=complete_episodes:

RolloutWorker.sample() will run for at least 1 episode.
RolloutWorker.sample() will always run for n complete episodes. The returned batch will thus always have dones[-1] == True.
RolloutWorker.sample() will at least run for rollout_fragment_length timesteps. # ← will add this to the docs; seems to be missing there.

sven1977 · February 3, 2022, 10:58am

Doc-fix PR: https://github.com/ray-project/ray/pull/22074

system · February 4, 2022, 10:58am

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can you specify workers in rllib algorithm to each collect the same number of episodes? Or each a specific number? RLlib	1	26	September 13, 2024
Handling of Incomplete Episodes in RLlib RLlib	0	23	September 25, 2024
Needs help on understanding `buffer_size` and `train_batch_size` RLlib	4	1120	October 30, 2021
Need help with debugging: Getting batch size of 0 RLlib	1	354	October 8, 2021
How to tell RLLIB trainer (Not Tune) to run that many number of episodes RLlib	7	1160	June 9, 2023

[RLlib] Batch size for complete_episodes issue

Related topics