Evaluation run seems to not change at all, in any of my runs?

hridayns · September 14, 2022, 3:34pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

In multiple trials of my experiment, my evaluation reward / length seem to always stay at a constant value.

evaluation_len_mean_no_change

I am using Ray 2.0.0, and I wonder if it’s because of my evaluation configuration options? Do I have to set explore to True? But this behaviour still confuses me, since each evaluation run should be different, so it’s confusing how my policy seems to get the same answer every time?

.evaluation(
            evaluation_interval = 10,
            evaluation_duration = 10,
            evaluation_duration_unit = 'episodes',
            # evaluation_sample_timeout_s = 180.0,
            evaluation_parallel_to_training = False,
            evaluation_config = {
               'explore': False,
               'exploration_config' : {'type': 'StochasticSampling'}
            },
            evaluation_num_workers = 1,
            # custom_evaluation_function = None
            always_attach_evaluation_results = True,
            # in_evaluation = False,
            # sync_filters_on_rollout_workers_timeout_s = 60.0
            evaluation_sample_timeout_s=3600,
        )\

Please advise.

arturn · September 14, 2022, 3:53pm

This is hard to say without knowing your environment.
But:

If you environment’s episodes never end, you kill them after evaluation_sample_timeout_s. Is you env slow? Could 0.5 steps/second be realistic?
Do you have an environment that generally finishes after 1.5k steps?
How do these metrics look while training and not while evaluating?

Your evaluation config is, generally speaking, fine.

Cheers

hridayns · September 14, 2022, 4:01pm

I would say the environment is pretty slow because I have 1000 agents that have to pass through a 5000 m road!

I didn’t understand the part about 0.5 steps/second being realistic - am I setting this in my config? Please advise.

My environment can take anywhere from ~1000s to ~4000s for completion, basically a bunch of cars moving on lanes to reach the end of the road.
It seems fine, even seems like it’s learning quite well (since the goal is to reduce the length of the episode), but the evaluation graphs don’t budge!

episode_reward_mean_changing1453×413 29.7 KB

arturn · September 14, 2022, 4:06pm

Looking at your epsidoe_len_mean, it strikes me as important that it a) changes and b) is always higher than the evaluation episode. What happens if you set evaluation_sample_timeout_s=7200? Cheers

hridayns · September 19, 2022, 8:42pm

Hello, sorry again. I tried to set the evaluation_sample_timeout_s to 7200 and this is what I got (orange lines is run before resume, blue is resuming after the same error as mentioned in my other posts):

I am not understanding how this system works anymore

Topic		Replies	Views
Trainer.evaluate() runs 1 extra episode instead of as defined in evaluation_duration RLlib	1	364	August 26, 2022
Evaluation_interval not work Ray Tune stopping condition & comparisons	2	421	November 30, 2022
Cannot get a simple Evaluation to work as intended RLlib	6	388	September 5, 2022
Inconsistent number of episodes with 'evaluate' RLlib	2	261	July 18, 2022
Evaluation worker won't stop RLlib	3	571	June 19, 2022

Evaluation run seems to not change at all, in any of my runs?

Related topics