How does environment creation work? Specifically, why am I recognizing 2 environments when training?

For context, I am doing multiagent learning with a PPO config and with the number of agents varying during training. I am also just trying to understand rllib a little bit better. When I start training (I am training manually, without tune atm), I am able to see that there are two environments being created and that both environments are adding agents during training. Why are there two environments? Also, assuming my simulation takes up a lot of resources, how would I address resource use?