Different Environment for training and evaluation

Bimser · July 9, 2021, 4:19pm

Hello there,

i have a question regarding the evaluation of my trained RL-algorithm.
Is it right, that when I start the training, two environments are created by RLlib; one for training and one for evaluation ?

Can i somehow use exactly the same environment for the evaluation?

Hope you can help me with that question.

arturn · July 12, 2021, 1:49pm

That depends on your configuration. Could you post an informative piece of your code, if not too lengthy?

Normally, you training produces metrics that will be displayed during training. These metrics are snapshots of what you policy performs like. In this case, you will not have separate environment for evaluation.
That being said, you can stop training every now and then and evaluate your policy. Or you can even evaluate in parallel - while training. Normally, you can safely use the cofiguration option evaluation_num_workers. Evaluation workers will have their own environments. While this can also depend on your environment, in the best case multiple instances of your environment should produce independent and very comparable experiences.

Bimser · July 13, 2021, 6:32am

Thanks for the answer.
I already use the envaluation_config to define evaluation parameters.

I am using

`tune.register_env("my_env", Environment)`
config = {..."env" : "my_env", evaluation_config : {"env":"my_env"}

even if i am not defining a env in the evaluation_config ray creates seperate environments.

mannyv · July 13, 2021, 12:36pm

Hi @Bimser,

RLIB does not support using the same environment for both training and evaluation. In the rollout worker it creates two separate sets of workers, one for each mode.

Here is where it creates the evaluation workers with their own environments:

github.com

ray-project/ray/blob/e4123fff271b1a358712850ba07195a6fb02e8b0/rllib/agents/trainer.py#L760

    
      
                      evaluation_config.update({
                          "batch_mode": "complete_episodes",
                          "in_evaluation": True,
                      })
                      logger.debug(
                          "using evaluation_config: {}".format(extra_config))
                      # Create a separate evaluation worker set for evaluation.
                      # If evaluation_num_workers=0, use the evaluation set's local
                      # worker for evaluation, otherwise, use its remote workers
                      # (parallelized evaluation).
                      self.evaluation_workers = self._make_workers(
                          env_creator=self.env_creator,
                          validate_env=None,
                          policy_class=self._policy_class,
                          config=evaluation_config,
                          num_workers=self.config["evaluation_num_workers"])
          
          
@override(Trainable)
          def cleanup(self):
              if hasattr(self, "workers"):
                  self.workers.stop()

arturn · July 13, 2021, 6:04pm

Blockquote RLIB does not support using the same environment for both training and evaluation. In the rollout worker it creates two separate sets of workers, one for each mode.

@mannyv I have worked around this with LGSVL, where it makes no sense to have separate environments per worker, because the simulation eats up so much resources.
In my case it looks as if there are separate environments to RLLib. Not sure if that is what OP aims at.

arturn · July 13, 2021, 10:24pm

@Bimser

I am using
tune.register_env("my_env", Environment)
config = {…“env” : “my_env”, evaluation_config : {“env”:“my_env”}
even if i am not defining a env in the evaluation_config ray creates seperate environments.

As Bimser said, RLlib does not support using the same environment (if you keep it simple). You can still instantiate an environment before your tune.run() call, and pass the object ID to your config. Then go and write another environment that takes that object ID and acts on the original environment through it. This only makes sense if your environment is rather slow and the overhead of accessing the environment is still comparibly low.

Topic		Replies	Views
Use a remote worker for Evaluation RLlib	5	524	July 5, 2021
Parallel workers compute action RLlib	4	681	June 12, 2021
Expanding RLlib learning environment with multiple simulators and machines while reducing communication overhead Configure Algorithm, Training, Evaluation, Scaling	1	422	June 23, 2023
Custom simulator with as RLlib environment RLlib	1	475	December 17, 2020
Training on multiple environment Offline RL	2	896	February 14, 2023

Different Environment for training and evaluation

Related topics