Episode_reward_mean that ASHA Scheduler expects not found in results

araman5 · March 5, 2025, 5:14am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi,
I am attempting to do an HPO using RayTune on a Soft-Actor-Critic RL agent that is being trained on a custom environment. I am using ray version 2.40 and my python version is 3.11.

       sac_config = (
            SACConfig()
            .environment(env = Env.custom_env,
                         env_config=env_config,
                         disable_env_checking=True)
            .env_runners(num_env_runners=5,
                         num_gpus_per_env_runner=0,
                         num_cpus_per_env_runner=2)
            .learners(num_learners=1,
                      num_gpus_per_learner=0.25
                      )
            .framework("torch")
            .training(                      
                      initial_alpha=0.2,
                      target_entropy='auto',
                      twin_q=True
            )
        )

        # Define the search space
        search_space = {
            "gamma": tune.uniform(0.9, 0.999),  # Discount factor
            "actor_lr": tune.loguniform(1e-5, 1e-3),  # Learning rate
            "critic_lr": tune.loguniform(1e-5, 1e-3),  # Learning rate
            "train_batch_size": tune.choice([128, 256, 512]),  # Batch size
            "tau": tune.uniform(0.005, 0.05),  # Soft update coefficient
        }

        search_alg = OptunaSearch(
            metric="episode_reward_mean",
            mode="max"
        )

         # ASHA scheduler for early stopping
        asha_scheduler = ASHAScheduler(
            metric="episode_reward_mean",
            mode="max",
            max_t=150,
            grace_period=100,
            reduction_factor=2,
        )

        tuner = tune.Tuner(
            "SAC",
            param_space={**sac_config.to_dict(), **search_space},
            tune_config=TuneConfig(               
                num_samples=1,  # Number of trials
                search_alg=search_alg,
                scheduler=asha_scheduler,  # Use ASHA for early stopping
                max_concurrent_trials=4,
            )
        )

When I run the above code I get the error :

ValueError: Trial returned a result which did not include the specified metric(s) `episode_reward_mean` that `AsyncHyperBandScheduler` expects

I looked at the result json and saw that there was no key which indicated anything that looked like a reward or “return”. I am guessing something changed in Ray 2.40.

The exact same code ran without a glitch when using Ray 2.10 (Although I did have to remove the env_runners and learners from the config as Ray 2.10 does not have learners and runners)

Hoping one of you guys can help me out with this.

mannyv · March 10, 2025, 3:17pm

In recent versions of rllib the metric was renamed to episode_return_mean.

araman5 · March 11, 2025, 2:41pm

So I renamed it to episode_return_mean and I get the same error:

ValueError: Trial returned a result which did not include the specified metric(s) `episode_return_mean` that `AsyncHyperBandScheduler` expects. Make sure your calls to `tune.report()` include the metric, or set the TUNE_DISABLE_STRICT_METRIC_CHECKING environment variable to 1

I did closely inspect the created config dict and indeed there was no instance of any kind of “return” or “reward”.

Also I should note that I had to remove specifying env_runners and learners in my sac_config as that raised a separate error where the actor kept getting killed during it’s creation.

I suspect that because I am not explicitly setting the env_runners and that episode_return_mean is a key inside env_runners, this error of ValueError: .....episode_return_mean is popping up

mannyv · March 11, 2025, 3:12pm

@araman5,

The most straightforward way of checking this will be to do something like this.

algo=sac_config.build()
results=algo.train() 
print(results)

Find the metric you are interested in that was printed to the terminal and use that.

araman5 · March 11, 2025, 4:23pm

So that’s what I ended up doing with a simple Pendulum environment:

sac_config = (
            SACConfig()
            .environment(env="Pendulum-v1")
            # .environment(env = Env.Live_Streaming,
            #              env_config=env_config,
            #              disable_env_checking=True)
            .env_runners(num_env_runners=5)
            .learners(num_learners=1,
                      num_gpus_per_learner=0.25)
            .framework("torch")
            .training(
                tau = 0.02,
                train_batch_size = 128,
                gamma = 0.97,
                initial_alpha=0.2,
                target_entropy='auto',
                twin_q=True,
                actor_lr=1.0e-5,
                critic_lr=1.0e-5                                
            )            
        )

        algo = sac_config.build()
        results = algo.train()

        print(results)

In the output result there is nothing that looks like a reward or return.

araman5 · March 11, 2025, 4:53pm

Update:

Instead of printing the result after just 1 call of algo.train(), I looked at the result json after 5 calls of algo.train(), and now I see a results['env_runners']['agent_episodes_returns_mean']['default_policy']

However for my own custom environment, I don’t see the agent_episodes_returns_mean

mannyv · March 11, 2025, 6:07pm

Hi @araman5,

You will not have any return/reward metrics until an episode terminate. Similarly the return/reward will not update until new episodes terminate. Finally the return/reward is the moving average of the previous 100 episodes.

The logic for when 1 training iteration completes is at the link below. Perhaps you can configure one of those settings to suit your needs.

github.com/ray-project/ray

rllib/algorithms/algorithm.py

74f53dfc7


      
              else:
                  self.sampled = (
                      self.algo._counters[NUM_ENV_STEPS_SAMPLED]
                      - self.init_env_steps_sampled
                  )
                  self.trained = (
                      self.algo._counters[NUM_ENV_STEPS_TRAINED]
                      - self.init_env_steps_trained
                  )
          
          min_t = self.algo.config.min_time_s_per_iteration
          min_sample_ts = self.algo.config.min_sample_timesteps_per_iteration
          min_train_ts = self.algo.config.min_train_timesteps_per_iteration
          # Repeat if not enough time has passed or if not enough
          # env|train timesteps have been processed (or these min
          # values are not provided by the user).
          if (
              (not min_t or time.time() - self.time_start >= min_t)
              and (not min_sample_ts or self.sampled >= min_sample_ts)
              and (not min_train_ts or self.trained >= min_train_ts)
          ):

araman5 · March 11, 2025, 6:28pm

Thanks a lot. This might just prove to be the solution. Appreciate your help, @mannyv

araman5 · March 11, 2025, 6:55pm

So I figure I would need to change the min_sample_timesteps_per_t to something that is more suitable for my custom environment which has fixed episode length=180.

I don’t see how I can update that from the the sac_config dict though. I see training_intensity in the documentation but that doesn’t seem to be it.

mannyv · March 11, 2025, 7:02pm

It is in reporting

github.com/ray-project/ray

rllib/algorithms/algorithm_config.py

9247affad


      
              return self
          
          def reporting(
              self,
              *,
              keep_per_episode_custom_metrics: Optional[bool] = NotProvided,
              metrics_episode_collection_timeout_s: Optional[float] = NotProvided,
              metrics_num_episodes_for_smoothing: Optional[int] = NotProvided,
              min_time_s_per_iteration: Optional[float] = NotProvided,
              min_train_timesteps_per_iteration: Optional[int] = NotProvided,
              min_sample_timesteps_per_iteration: Optional[int] = NotProvided,
              log_gradients: Optional[bool] = NotProvided,
          ) -> "AlgorithmConfig":
              """Sets the config's reporting settings.
          
              Args:
                  keep_per_episode_custom_metrics: Store raw custom metrics without
                      calculating max, min, mean
                  metrics_episode_collection_timeout_s: Wait for metric batches for at most
                      this many seconds. Those that have not returned in time are collected
                      in the next train iteration.

Topic		Replies	Views
Send customised metric from gym to ASyncHyperBand in Tune Ray Tune stopping condition & comparisons	1	636	March 22, 2023
Trial returned a result which did not include the specified metric(s) `eval_acc` that `PopulationBasedTraining` expects Ray Tune	2	1568	April 18, 2023
Could not find best trial Ray Tune	8	2862	December 21, 2020
Use scheduler and search_alg with different metrics Ray Tune	0	13	July 14, 2025
Error when setting up bayesian optimizer with asha scheduler Ray Tune	1	515	October 18, 2021

Episode_reward_mean that ASHA Scheduler expects not found in results

Related topics