i have a question regarding the evaluation of my trained RL-algorithm.
Is it right, that when I start the training, two environments are created by RLlib; one for training and one for evaluation ?
Can i somehow use exactly the same environment for the evaluation?
That depends on your configuration. Could you post an informative piece of your code, if not too lengthy?
Normally, you training produces metrics that will be displayed during training. These metrics are snapshots of what you policy performs like. In this case, you will not have separate environment for evaluation.
That being said, you can stop training every now and then and evaluate your policy. Or you can even evaluate in parallel - while training. Normally, you can safely use the cofiguration option evaluation_num_workers. Evaluation workers will have their own environments. While this can also depend on your environment, in the best case multiple instances of your environment should produce independent and very comparable experiences.
RLIB does not support using the same environment for both training and evaluation. In the rollout worker it creates two separate sets of workers, one for each mode.
Here is where it creates the evaluation workers with their own environments:
Blockquote RLIB does not support using the same environment for both training and evaluation. In the rollout worker it creates two separate sets of workers, one for each mode.
@mannyv I have worked around this with LGSVL, where it makes no sense to have separate environments per worker, because the simulation eats up so much resources.
In my case it looks as if there are separate environments to RLLib. Not sure if that is what OP aims at.
I am using tune.register_env("my_env", Environment)
config = {…“env” : “my_env”, evaluation_config : {“env”:“my_env”}
even if i am not defining a env in the evaluation_config ray creates seperate environments.
As Bimser said, RLlib does not support using the same environment (if you keep it simple). You can still instantiate an environment before your tune.run() call, and pass the object ID to your config. Then go and write another environment that takes that object ID and acts on the original environment through it. This only makes sense if your environment is rather slow and the overhead of accessing the environment is still comparibly low.