Evaluation run seems to not change at all, in any of my runs?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

In multiple trials of my experiment, my evaluation reward / length seem to always stay at a constant value.


I am using Ray 2.0.0, and I wonder if it’s because of my evaluation configuration options? Do I have to set explore to True? But this behaviour still confuses me, since each evaluation run should be different, so it’s confusing how my policy seems to get the same answer every time?

            evaluation_interval = 10,
            evaluation_duration = 10,
            evaluation_duration_unit = 'episodes',
            # evaluation_sample_timeout_s = 180.0,
            evaluation_parallel_to_training = False,
            evaluation_config = {
               'explore': False,
               'exploration_config' : {'type': 'StochasticSampling'}
            evaluation_num_workers = 1,
            # custom_evaluation_function = None
            always_attach_evaluation_results = True,
            # in_evaluation = False,
            # sync_filters_on_rollout_workers_timeout_s = 60.0

Please advise.

This is hard to say without knowing your environment.

  1. If you environment’s episodes never end, you kill them after evaluation_sample_timeout_s. Is you env slow? Could 0.5 steps/second be realistic?
  2. Do you have an environment that generally finishes after 1.5k steps?
  3. How do these metrics look while training and not while evaluating?

Your evaluation config is, generally speaking, fine.


  1. I would say the environment is pretty slow because I have 1000 agents that have to pass through a 5000 m road!

I didn’t understand the part about 0.5 steps/second being realistic - am I setting this in my config? Please advise.

  1. My environment can take anywhere from ~1000s to ~4000s for completion, basically a bunch of cars moving on lanes to reach the end of the road.

  2. It seems fine, even seems like it’s learning quite well (since the goal is to reduce the length of the episode), but the evaluation graphs don’t budge!

Looking at your epsidoe_len_mean, it strikes me as important that it a) changes and b) is always higher than the evaluation episode. What happens if you set evaluation_sample_timeout_s=7200? Cheers

1 Like

Hello, sorry again. I tried to set the evaluation_sample_timeout_s to 7200 and this is what I got (orange lines is run before resume, blue is resuming after the same error as mentioned in my other posts):

I am not understanding how this system works anymore