Trainer.evaluate() runs 1 extra episode instead of as defined in evaluation_duration

I am simple trying to evaluate a model from a checkpoint for 10 episodes. I am unable to use tune for evaluation as it just seems impossibly complicated to manage the number of episodes it will run - I have no idea why.

Instead, I have moved to try using trainer.evaluate() after restoring my checkpoint using trainer.restore(). This seems to do what I need, minus the auto-generated result information (in Tune) and the fact that it always runs 1 more episode than that defined in the evaluation_duration config parameter.

Why is this the case? And how can I fix it? These are the evaluation-related configuration options I have set:

# Evaluation settings
policy_conf['evaluation_interval'] = 0
policy_conf['evaluation_duration'] = 10 # change to 1 episode?
policy_conf['evaluation_duration_unit'] = 'episodes'
policy_conf['evaluation_parallel_to_training'] = False
policy_conf['in_evaluation'] = False
policy_conf['evaluation_config'] = {}
policy_conf['evaluation_num_workers'] = 1
policy_conf['custom_eval_function'] = None
policy_conf['always_attach_evaluation_results'] = True
policy_conf['sample_async'] = False

Let me know if any other information is required. I would appreciate any help at all, on the matter. Thank you.

Hi @hridayns,

Have you tried this in the latest release? 2.0. There were some issues with 1.13 running too many evaluations [RLlib] Excessive evaluation if rollout_fragment_length < timesteps_per_iteration · Issue #27821 · ray-project/ray · GitHub but that has been fixed in the latest release.

