Proper way to load and evaluate trained agent

What is the proper way to load a trained agent and run one or more evaluation episodes?

What I want to do is:

  1. Load the agent from disk
  2. Set it in evaluation mode (no exploration etc)
  3. Run one episode on a custom environment

What I’m trying to do now is to load the trainer, extract its policy and do something like this, but I’m wondering if there’s a better way

Hi @fedetask ,
and welcome to the dscussion board.

The simplest form is to restore the Trainer with which you trained your policy and call .evaluate() on it.

Okay that looks fast. However, I’m trying to do what you said and when I call trainer.evaluate() I get the following error:

ValueError: Cannot evaluate w/o an evaluation worker set in the Trainer or w/o an env on the local worker!
Try one of the following:
 1) Set `evaluation_interval` >= 0 to force creating a separate evaluation worker set.
 2) Set `create_env_on_driver=True` to force the local (non-eval) worker to have an environment to evaluate on.

I tried to set trainer.config['create_env_on_driver'] = True before calling evaluate(), but it doesn’t change anything. And I guess the point 2) isn’t related to my case as I’m not training

Could you output your Trainer's config['evaluation_interval'], config['evaluation_duration'] please?

config['evaluation_interval'] is None and evaluation_duration is not even present in the config. If I try to set config['evaluation_duration'] to some random value, I get

Exception: Unknown config parameter 'evaluation_duration' 

Problem solved!

I was setting config['create_env_on_driver'] = True before calling evaluate() but after loading the trainer, therefore I guess it didn’t have any effect.

Thank you very much for your help!

1 Like

@fedetask,

One small comment. My understanding is that if you trained with exploration/stochastic actions then you can only expect your policy to produce its optimal actions with exploration on during evaluation as well.

Theory aside, I have tested this on my policies and environments and it has consistently been that case for me that I see performance deteriorate if the explore setting differs between training and testing. YMMV.

1 Like