[Rllib, Tune, AIR] Checkpointing as per custom metric minimum

PhilippWillms · November 11, 2024, 8:06pm

Hi Ray community,

I am using successfully the checkpointing combination of RLlib, Tune and Air, as shown in the example below.

It would be beneficial for me to create a checkpoint not in a fixed interval, like every 10th iteration, but every time I reach a better value for a metric, e.g. “episode_reward_max” or a even a custom metric.

So in words, it would be like “create a checkpoint every time you reach a lower value for episode_reward_max than you did in all iterations before”.

Has anyone experience with that?

Example of a fixed interval of 10 iterations to create checkpoint:

tuner = tune.Tuner(
    "PPO",
    param_space=config,
    run_config=air.RunConfig(
        checkpoint_config=air.CheckpointConfig(
            checkpoint_score_attribute="episode_reward_mean",
            checkpoint_score_order="max",
            checkpoint_frequency=10,
            checkpoint_at_end=True,
        ),
        tune_config=tune.TuneConfig(num_samples=2)
)

PhilippWillms · November 25, 2024, 11:48pm

The following code snippet is something what could help me. The idea is that always the last 2 checkpoints are kept on disk, where training reached a new minimum value of the custom metric number. The custom metric is recorded in the on_episode_end() of a custom callback. However, I observe that only the checkpoint_at_end=True becomes effective, and a single checkpoint is created after trial end.

Which piece is missing?

tuner = tune.Tuner(
    "PPO",
    param_space=config,
    run_config=air.RunConfig(
        checkpoint_config=air.CheckpointConfig(
            num_to_keep=2,
            checkpoint_score_attribute="custom_metrics/number_min",
            checkpoint_score_order="min",
            checkpoint_at_end=True,
        ),
        tune_config=tune.TuneConfig(num_samples=2)
)

PhilippWillms · June 30, 2025, 5:34pm

@justinvyu , @christina : This older thread becomes now further relevant. I do not see dev effort, just a refined understanding of how Tune Checkpointing works.

Thanks for your help!

arturn · July 1, 2025, 8:45am

Hi @PhilippWillms ! Sorry this is not solved yet. Your expectations aline with mine.
Is there a related github issue with a reproduction script? That would be suitable to track and solve this. If not, could you please file one and I’ll reproduce and take further steps.

PhilippWillms · July 1, 2025, 7:13pm

Done, I created [RLlib,Tune,AIR] Checkpointing scoring per custom metric does not work · Issue #54251 · ray-project/ray · GitHub

christina · July 2, 2025, 9:56pm

Thank you for surfacing this issue and creating a ticket!

Topic		Replies	Views
Use `checkpoint_score_attr` with custom metric Ray Tune	3	509	May 11, 2022
Saving checkpoints with good custom_metric using tune.run() Ray Tune	18	2300	July 20, 2021
Custom checkpoints in RLLIB RLlib	1	195	December 23, 2023
Store best checkpoints according to evaluation metrics Checkpointing, Restoring	0	384	June 19, 2023
Which attributes can be used in `checkpoint_score_attr` when using `tune.run` RLlib	10	1213	April 20, 2022

[Rllib, Tune, AIR] Checkpointing as per custom metric minimum

Related topics