Episode_reward_mean that ASHA Scheduler expects not found in results

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi,
I am attempting to do an HPO using RayTune on a Soft-Actor-Critic RL agent that is being trained on a custom environment. I am using ray version 2.40 and my python version is 3.11.

       sac_config = (
            SACConfig()
            .environment(env = Env.custom_env,
                         env_config=env_config,
                         disable_env_checking=True)
            .env_runners(num_env_runners=5,
                         num_gpus_per_env_runner=0,
                         num_cpus_per_env_runner=2)
            .learners(num_learners=1,
                      num_gpus_per_learner=0.25
                      )
            .framework("torch")
            .training(                      
                      initial_alpha=0.2,
                      target_entropy='auto',
                      twin_q=True
            )
        )

        # Define the search space
        search_space = {
            "gamma": tune.uniform(0.9, 0.999),  # Discount factor
            "actor_lr": tune.loguniform(1e-5, 1e-3),  # Learning rate
            "critic_lr": tune.loguniform(1e-5, 1e-3),  # Learning rate
            "train_batch_size": tune.choice([128, 256, 512]),  # Batch size
            "tau": tune.uniform(0.005, 0.05),  # Soft update coefficient
        }

        search_alg = OptunaSearch(
            metric="episode_reward_mean",
            mode="max"
        )

         # ASHA scheduler for early stopping
        asha_scheduler = ASHAScheduler(
            metric="episode_reward_mean",
            mode="max",
            max_t=150,
            grace_period=100,
            reduction_factor=2,
        )

        tuner = tune.Tuner(
            "SAC",
            param_space={**sac_config.to_dict(), **search_space},
            tune_config=TuneConfig(               
                num_samples=1,  # Number of trials
                search_alg=search_alg,
                scheduler=asha_scheduler,  # Use ASHA for early stopping
                max_concurrent_trials=4,
            )
        )

When I run the above code I get the error :

ValueError: Trial returned a result which did not include the specified metric(s) `episode_reward_mean` that `AsyncHyperBandScheduler` expects 

I looked at the result json and saw that there was no key which indicated anything that looked like a reward or “return”. I am guessing something changed in Ray 2.40.

The exact same code ran without a glitch when using Ray 2.10 (Although I did have to remove the env_runners and learners from the config as Ray 2.10 does not have learners and runners)

Hoping one of you guys can help me out with this. :slight_smile:

In recent versions of rllib the metric was renamed to episode_return_mean.

So I renamed it to episode_return_mean and I get the same error:

ValueError: Trial returned a result which did not include the specified metric(s) `episode_return_mean` that `AsyncHyperBandScheduler` expects. Make sure your calls to `tune.report()` include the metric, or set the TUNE_DISABLE_STRICT_METRIC_CHECKING environment variable to 1

I did closely inspect the created config dict and indeed there was no instance of any kind of “return” or “reward”.

Also I should note that I had to remove specifying env_runners and learners in my sac_config as that raised a separate error where the actor kept getting killed during it’s creation.

I suspect that because I am not explicitly setting the env_runners and that episode_return_mean is a key inside env_runners, this error of ValueError: .....episode_return_mean is popping up

@araman5,

The most straightforward way of checking this will be to do something like this.

algo=sac_config.build()
results=algo.train() 
print(results) 

Find the metric you are interested in that was printed to the terminal and use that.

So that’s what I ended up doing with a simple Pendulum environment:

sac_config = (
            SACConfig()
            .environment(env="Pendulum-v1")
            # .environment(env = Env.Live_Streaming,
            #              env_config=env_config,
            #              disable_env_checking=True)
            .env_runners(num_env_runners=5)
            .learners(num_learners=1,
                      num_gpus_per_learner=0.25)
            .framework("torch")
            .training(
                tau = 0.02,
                train_batch_size = 128,
                gamma = 0.97,
                initial_alpha=0.2,
                target_entropy='auto',
                twin_q=True,
                actor_lr=1.0e-5,
                critic_lr=1.0e-5                                
            )            
        )

        algo = sac_config.build()
        results = algo.train()

        print(results)

In the output result there is nothing that looks like a reward or return.

Update:

Instead of printing the result after just 1 call of algo.train(), I looked at the result json after 5 calls of algo.train(), and now I see a results['env_runners']['agent_episodes_returns_mean']['default_policy']

However for my own custom environment, I don’t see the agent_episodes_returns_mean

Hi @araman5,

You will not have any return/reward metrics until an episode terminate. Similarly the return/reward will not update until new episodes terminate. Finally the return/reward is the moving average of the previous 100 episodes.

The logic for when 1 training iteration completes is at the link below. Perhaps you can configure one of those settings to suit your needs.

Thanks a lot. This might just prove to be the solution. Appreciate your help, @mannyv

So I figure I would need to change the min_sample_timesteps_per_t to something that is more suitable for my custom environment which has fixed episode length=180.

I don’t see how I can update that from the the sac_config dict though. I see training_intensity in the documentation but that doesn’t seem to be it.

It is in reporting

1 Like