Understanding train_batch_size in multiagent RL

Am I right in the assumption that when I have multiple policies controlling the same agent, the policy updates at the end of one training iteration (trainer.train()) will only be on the experience where a particular policy was active?
That would mean e.g. if I have train_batch_size 4000 with PPO and 2 agents control one agent (each episode one of them is chosen randomly), then the SGD epoch would have size ~2000 for each of the policies, as each is active about half of the episodes (assuming more or less constant episode lengths).