That depends on your configuration. Could you post an informative piece of your code, if not too lengthy?
Normally, you training produces metrics that will be displayed during training. These metrics are snapshots of what you policy performs like. In this case, you will not have separate environment for evaluation.
That being said, you can stop training every now and then and evaluate your policy. Or you can even evaluate in parallel - while training. Normally, you can safely use the cofiguration option evaluation_num_workers. Evaluation workers will have their own environments. While this can also depend on your environment, in the best case multiple instances of your environment should produce independent and very comparable experiences.