Trainer.evaluate() runs 1 extra episode instead of as defined in evaluation_duration

hridayns · August 24, 2022, 7:54am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I am simple trying to evaluate a model from a checkpoint for 10 episodes. I am unable to use tune for evaluation as it just seems impossibly complicated to manage the number of episodes it will run - I have no idea why.

Instead, I have moved to try using trainer.evaluate() after restoring my checkpoint using trainer.restore(). This seems to do what I need, minus the auto-generated result information (in Tune) and the fact that it always runs 1 more episode than that defined in the evaluation_duration config parameter.

Why is this the case? And how can I fix it? These are the evaluation-related configuration options I have set:

# Evaluation settings
policy_conf['evaluation_interval'] = 0
policy_conf['evaluation_duration'] = 10 # change to 1 episode?
policy_conf['evaluation_duration_unit'] = 'episodes'
policy_conf['evaluation_parallel_to_training'] = False
policy_conf['in_evaluation'] = False
policy_conf['evaluation_config'] = {}
policy_conf['evaluation_num_workers'] = 1
policy_conf['custom_eval_function'] = None
policy_conf['always_attach_evaluation_results'] = True
policy_conf['sample_async'] = False

Let me know if any other information is required. I would appreciate any help at all, on the matter. Thank you.

mannyv · August 26, 2022, 1:37pm

Hi @hridayns,

Have you tried this in the latest release? 2.0. There were some issues with 1.13 running too many evaluations [RLlib] Excessive evaluation if rollout_fragment_length < timesteps_per_iteration · Issue #27821 · ray-project/ray · GitHub but that has been fixed in the latest release.

Topic		Replies	Views
Cannot get a simple Evaluation to work as intended RLlib	6	389	September 5, 2022
Inconsistent number of episodes with 'evaluate' RLlib	2	263	July 18, 2022
Evaluation_interval not work Ray Tune stopping condition & comparisons	2	421	November 30, 2022
Evaluation run seems to not change at all, in any of my runs? RLlib	4	290	September 19, 2022
Evaluation worker won't stop RLlib	3	575	June 19, 2022

Trainer.evaluate() runs 1 extra episode instead of as defined in evaluation_duration

Related topics