Cannot retreive checkpoint from tune due to nested metrics of RLlib

Morphlng · October 4, 2025, 11:26am

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

Ray version: 2.49.2
Python version: 3.10.18
OS: Windows 11 24H2
Cloud/Infrastructure: Local
Other libs/tools (if relevant): PyTorch

3. What happened vs. what you expected:

Expected:

I’ve been using tune.Tuner for a PPO training trial. After training finished, I would like to retrieve the best_checkpoint using the following lines:

best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)

This works fine back in Ray 2.8.1 with metric “episode_reward_mean”. But now since all metrics are aggregated into “env_runners”, seems like Tune cannot find keys in nested metrics.

Actual:

christina · October 6, 2025, 9:38pm

Hi Morphlng!

I think you’re right, env_runners/episode_return_mean may not be included at the top level for each checkpoint, especially if not reported in the first iteration or not present in every checkpoint’s metrics after they’ve been nested/ aggregated.

They talk a bit about it here in this GitHub issue: [Tune] Air output metrics are replaced by irrelevant inferred metrics · Issue #45547 · ray-project/ray · GitHub

For Ray Tune the metric would look like as follows "evaluation/env_runners/episode_return_mean".

Maybe you can try that and see if that works? Or do you have a list of available checkpoint metrics from Tune possibly? It’s cut off in the screenshot.

Morphlng · October 7, 2025, 1:20am

Thanks for the reply! I have tried to use Cartpole to reproduce the problem, and the full error message goes like:

RuntimeError: Invalid metric name env_runners/episode_return_mean! You may choose from the following metrics: dict_keys(['timers', 'env_runners', 'learners', 'num_training_step_calls_per_iteration', 'num_env_steps_sampled_lifetime', 'fault_tolerance', 'env_runner_group', 'done', 'training_iteration', 'trial_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'config', 'time_since_restore', 'iterations_since_restore', 'perf', 'experiment_tag']).

I’ve also tried to use air.Result to load from disk, and it would work perfectly. This confused me a lot since according to the traceback, they are calling from the same function.

Morphlng · October 7, 2025, 1:23am

There is another thing I want to report and is related to this problem.

It seems like RLlib’s default result_dict does not contain the data needed for air.Result to read from, that is checkpoint_dir_name. So if you pass the directory to Result.from_path, it is likely to report that:

I have wrote a Callback to manually add this column to fix this problem. But It would be great if the Ray team could officially fix it.

class CheckpointCallback(RLlibCallback):
    def on_train_result(
        self,
        *,
        algorithm: "Algorithm",
        metrics_logger: MetricsLogger = None,
        result: dict,
        **kwargs
    ):
        if algorithm._storage:
            algorithm._storage.current_checkpoint_index += 1
            result["checkpoint_dir_name"] = algorithm._storage.checkpoint_dir_name
            algorithm._storage.current_checkpoint_index -= 1

christina · October 7, 2025, 10:42pm

Hello! Thank you for surfacing this! Have you had time to file a GitHub issue for these problems, or should I do it? I’ll file a ticket for our RLlib team if needed

Morphlng · October 8, 2025, 1:26am

Great! I’ve filed an issue on Github [RLlib] Cannot retreive checkpoint from tune due to nested metrics of RLlib · Issue #57533 · ray-project/ray, hope this could be solved in next release.

Topic		Replies	Views
Can't properly restore result trained with RLlib using Ray.train.Result RLlib	1	150	May 29, 2024
Saving checkpoints with good custom_metric using tune.run() Ray Tune	18	2348	July 20, 2021
Which attributes can be used in `checkpoint_score_attr` when using `tune.run` RLlib	10	1248	April 20, 2022
Custom metrics over evaluation only RLlib	8	1835	December 16, 2021
Unable to access best results checkpoint Ray Tune	1	7	January 2, 2026

Cannot retreive checkpoint from tune due to nested metrics of RLlib

Related topics