Proper way to load and evaluate trained agent

fedetask · March 4, 2022, 9:11am

What is the proper way to load a trained agent and run one or more evaluation episodes?

What I want to do is:

Load the agent from disk
Set it in evaluation mode (no exploration etc)
Run one episode on a custom environment

What I’m trying to do now is to load the trainer, extract its policy and do something like this, but I’m wondering if there’s a better way

Lars_Simon_Zehnder · March 4, 2022, 1:38pm

Hi @fedetask ,
and welcome to the dscussion board.

The simplest form is to restore the Trainer with which you trained your policy and call .evaluate() on it.

fedetask · March 4, 2022, 2:06pm

Okay that looks fast. However, I’m trying to do what you said and when I call trainer.evaluate() I get the following error:

ValueError: Cannot evaluate w/o an evaluation worker set in the Trainer or w/o an env on the local worker!
Try one of the following:
 1) Set `evaluation_interval` >= 0 to force creating a separate evaluation worker set.
 2) Set `create_env_on_driver=True` to force the local (non-eval) worker to have an environment to evaluate on.

I tried to set trainer.config['create_env_on_driver'] = True before calling evaluate(), but it doesn’t change anything. And I guess the point 2) isn’t related to my case as I’m not training

Lars_Simon_Zehnder · March 4, 2022, 2:25pm

Could you output your Trainer's config['evaluation_interval'], config['evaluation_duration'] please?

fedetask · March 4, 2022, 2:31pm

config['evaluation_interval'] is None and evaluation_duration is not even present in the config. If I try to set config['evaluation_duration'] to some random value, I get

Exception: Unknown config parameter 'evaluation_duration'

fedetask · March 4, 2022, 2:35pm

Problem solved!

I was setting config['create_env_on_driver'] = True before calling evaluate() but after loading the trainer, therefore I guess it didn’t have any effect.

Thank you very much for your help!

mannyv · March 4, 2022, 3:01pm

@fedetask,

One small comment. My understanding is that if you trained with exploration/stochastic actions then you can only expect your policy to produce its optimal actions with exploration on during evaluation as well.

Theory aside, I have tested this on my policies and environments and it has consistently been that case for me that I see performance deteriorate if the explore setting differs between training and testing. YMMV.

Topic		Replies	Views
[RLlib] Questions about loading checkpoint and asynchrone evaluation workers RLlib	3	592	May 26, 2021
Using evaluation with ExternalEnv RLlib	1	225	October 5, 2021
Evaluating multiple policies in multiagent RLlib	4	520	July 6, 2021
Custom logging of agent behaviors RLlib	5	442	November 1, 2021
Parallel workers compute action RLlib	4	683	June 12, 2021

Proper way to load and evaluate trained agent

Related topics