Metric for PBT in Ray 2.40

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am trying to migrate my project from 1.11 to 2.40, mainly due to numerical instability issues I was able to narrow down were due to the algorithms in rllib. Migrating to the new api stack has been a bit of a challenge, but I can finally get things to run after a few days of trying, and I have not noticed the numerical instability issues with the 2.40 yet, which is a good sign. However, my full training routine involves performing HPO with PBT.

I define the PBT object as follows:

pbt = PopulationBasedTraining(
    time_attr="training_iteration",
    perturbation_interval=500,
    metric="evaluation/episode_reward_mean",
    mode="max",
    hyperparam_mutations={ */ hyperparam ranges and lists */ },
    custom_explore_fn=postprocess_func,
)

Here is my tune config:

algo_config = (
    PPOConfig()
    .environment(env = ENV_NAME, env_config = env_config)
    .evaluation(evaluation_config=eval_env_config, evaluation_num_env_runners=4, evaluation_interval=500, evaluation_duration=500, evaluation_duration_unit="episodes")
    .env_runners(num_env_runners=5)
    .framework("torch")
    .rollouts(
        rollout_fragment_length=200
    )
    .training(train_batch_size=2000, num_sgd_iter=5)
)


tuner = tune.Tuner(
    "PPO",
    param_space=algo_config.to_dict(),
    run_config=air.RunConfig(
        name="experiment_4",
        storage_path=f"file://{os.getcwd()}/experiments/initial_testing",
        stop={"training_iteration":10000}
    ),
    tune_config=tune.TuneConfig(num_samples=6, scheduler=pbt)
)

tuner.fit()

I specifically want the mutation decisions to be based on the evaluation reward means, and this used to work perfectly fine as far as I can tell, on 1.11. However, on 2.40, I am getting a strange error with the following signature:

File "/home/{}/miniconda3/envs/rluav2/lib/python3.9/site-packages/ray/tune/execution/tune_controller.py", line 1678, in _validate_result_metrics
    raise ValueError(
ValueError: Trial returned a result which did not include the specified metric(s) `evaluation/episode_reward_mean` that `PopulationBasedTraining` expects. Make sure your calls to `tune.report()` include the metric, or set the TUNE_DISABLE_STRICT_METRIC_CHECKING environment variable to 1. Result: {'num_training_step_calls_per_iteration': 1 ....

and it goes on. So evidently, the metric evaluation/episode_reward_mean does not exist within this Result object. I already tried the following:

  • I tried to set the metric as episode_reward_mean (without evaluation), which is what the example given in the docs use.

  • On tensorboard, after the training commences (when I try without setting the scheduler as pbt) I can see the card labeled “ray/tune/env_runners/episode_return_mean”, so I have tried setting the metric to be env_runners/episode_return_mean, episode_return_mean, ray/tune/env_runners/episode_return_mean and all of these give the same error, as none of these are in the Result object returned initially.

  • I can get rid of the error by setting the metric as something present in it, like env_runners/sample, but this does not help me at all.

  • I also tried setting os.environ["TUNE_DISABLE_STRICT_METRIC_CHECKING"] = "1" within the script which seems to be suggested by the initial error message, but I get a slightly different error message with essentially the same content:

raise RuntimeError(
RuntimeError: Cannot find metric env_runnes/episode_return_mean in trial result {'num_training_step_calls_per_iteration': 1, ...

With 1.11, I also used to pass the option "always_attach_evaluation_results": True, to the config dictionary, but when I pass this inside evaluation options in algo_config, I get an error:

raise ValueError(msg)
ValueError: `AlgorithmConfig.evaluation(always_attach_evaluation_results=..)` has been deprecated. This setting is no longer needed, b/c Tune does not error anymore (only warns) when a metrics key can't be found in the results.

and it does not proceed.

In summary: I want to perform HPO with PBT based on the evaluation episode reward mean metric, but I cannot seem to proceed to training with that or any useful metric.

Any help would be appreciated.

Hey there hnooh! Welcome to the Ray community :slight_smile:

The issue you’re running into with evaluation/episode_reward_mean not being found in Ray 2.40 is likely because evaluation metrics are handled differently in this version compared to 1.x.

I took a look at the docs and there might be a few things that can help you with this issue, but it does seem to be quite tricky.

In Ray 2.x, evaluation metrics aren’t automatically included in trial results unless explicitly configured. You can fix this by tweaking your PPOConfig to ensure evaluation metrics are logged correctly. You can read more about PPOConfig here: Algorithms — Ray 2.41.0

Can you also log the metrics you mentioned via Ray Tune? Ray Tune automatically logs some metrics so it’d be good as a sanity check to see if it’s there at all. You can also write a custom callback to ensure you are manually reporting them.

Also, the name might have changed in 2.X as opposed to 1.X. There’s a few instances where they mention env_runners/ episode_return_mean, so maybe you can adjust the name. You can compare the name of the variable in Tensorboard to the one that’s being printed out in the custom callback.

In the meantime if this is blocking you, you can always set the environment variable TUNE_DISABLE_STRICT_METRIC_CHECKING to FALSE in order to keep debugging and have your training continue in the meantime. Environment variables used by Ray Tune — Ray 2.41.0

There also is a handy migration guide for the API that might be helpful here: New API stack migration guide — Ray 2.41.0