Unexpected behaviour when using ConcatBatches

Hi all, I want to concatenate all the sample batches in a replay buffer and use the experience to compute an update to a forward dynamics model. At the moment, I have an issue with ConcatBatches where I would expect the combine operator to give me a single SampleBatch of the same size as the replay buffer. However, the size of the batches differs from the number of stores samples.

I also tried setting min_batch_size to the size of the replay buffer (local_replay_buffer.num_added). Surprisingly, this gives me SampleBatches of size 32, even at ever increasing replay buffer size.

Could anyone explain the behaviour of ConcatBatches?

 dynamics_op = (
            Replay(local_buffer=local_replay_buffer)
            .combine(
                ConcatBatches(
                    min_batch_size=config["train_batch_size"],
                )
            )
            .for_each(DynamicsOneStep(workers))
        )

Hi @sn73jq ,

Replay() yields a batch of size replay_batch_size.
Normally, your local_replay_buffer will be a MultiAgentReplayBuffer and by default, replay_batch_size will be equal to train_batch_size. Which is why you always sample batches of the same size from it!

That being said, the min_batch_size argument of ConcatBatches should influence the size of the batches, as the name suggests. But I suspect that the code you are writing lives inside an execution plan. The execution plan method is only executed once on the beginning of your training. Therefore, when it is executed, the replay buffer is empty and your min_batch_size becomes zero. This does not change later, because later on, not the execution plan method itself will be run, but the Iterator that is returned by it will be iterated over. This is not intuitive and will change in the near future. What it means is that the initial min_batch_size argument (which is 0) for ConcatBatches stays valid throughout your experiment.

If you really need batches comprising the whole buffer, for now, I think the easiest solution would be to modify the replay buffer code yourself and manually ignore the replay_batch_size in the replay buffer, always simply concatenating all batches.

We are working on making things with replay buffers simpler.
I hope this helps!