Which attributes can be used in `checkpoint_score_attr` when using `tune.run`

rohin_dasari · April 15, 2022, 8:07pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

I am using the tune API and I would like to checkpoint the top 5 models according to a performance metric. I believe this can be done using the checkpoint_score_attr and keep_checkpoints_num parameters in the tune.run api. So far my code looks like this:

    results = tune.run(
            'PPO',
            config=config,
            stop={'training_iteration': 25000},
            checkpoint_at_end=True,
            checkpoint_score_attr='episode_reward_mean',
            keep_checkpoints_num=5
            )

With this code, however, no checkpoints are being saved, which may indicate that the parameter passed to checkpoint_score_attr is invalid in some way. Is there some way to know which values are valid for the trainer I am using and is there any reason why the checkpoints aren’t being saved as is?

arturn · April 16, 2022, 7:16am

Hey @rohin_dasari,

I’m opening a PR to throw something more informative here.
In the meantime: if you change the stopping criteria to something invalid, tune will tell you what the results dict looks like and from there you can read the keys that will also be the ones to use with checkpoint_score_attr.

Lars_Simon_Zehnder · April 16, 2022, 11:28am

Hi @rohin_dasari ,

I tested this on one of my projects and this appears to run fine for me with checkpoint_score_attr. I tested with ray 1.11.0. I set keep_checkpoints_num=5 and the checkpoint_score_attr="episode_reward_mean" and it stores 5+1 checkpoints (the first one gets always stored). Only difference was I set local_dir="path/to/folder (I don’t believe this is the cause of your issue).

arturn · April 16, 2022, 11:42am

Hey Lars,

Yes, the example runs fine on master, too. But incase you use something like checkpoint_score_attr='episode_reward_means' there, you won’t get an error telling you the actual available dict keys.

PR is open.

Lars_Simon_Zehnder · April 16, 2022, 11:57am

Hi Artur,
got it. As far as I see Rohin’s issue he wrote the episode_reward_mean correct and did not get any checkpoints. Was this due to some other error already resolved?

arturn · April 16, 2022, 12:12pm

Tbh I would like to see a script that produces the error if this is an actual issue. @rohin_dasari
This could just be an issue with the algorithm not producing a proper results dict. But until we can reproduce, that’s just a guess.

rohin_dasari · April 19, 2022, 4:05pm

Strange. I ran the following example code to try and reproduce the problem:

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v3

# Based on code from github.com/parametersharingmadrl/parametersharingmadrl

if __name__ == "__main__":
    def env_creator(args):
        return PettingZooEnv(waterworld_v3.env())

    env = waterworld_v3.env()
    env.reset()
    register_env("waterworld", env_creator)

    tune.run(
        "PPO",
        stop={"training_iteration": 5},
        checkpoint_freq=1,
        config={
            # Enviroment specific
            "env": "waterworld",
            # General
            "num_gpus": 0,
            "num_workers": 1,
            # Method specific
            "multiagent": {
                "policies": set(env.agents),
                "policy_mapping_fn": (lambda agent_id, episode, **kwargs: agent_id),
            },
        },
        checkpoint_score_attr="episode_reward_mean",
        keep_checkpoints_num=2,
    )

When I ran this, the checkpoints showed up as expected. But for some reason it’s still not working with my original script and I’m having trouble identifying the root cause. After every training iteration, the results dict gets printed out:

agent_timesteps_total: 8000
  custom_metrics: {}
  date: 2022-04-19_12-04-21
  done: false
  episode_len_mean: 58.80597014925373
  episode_media: {}
  episode_reward_max: 116.0
  episode_reward_mean: -25.432835820895523
  episode_reward_min: -234.0
  episodes_this_iter: 67
  episodes_total: 67
...

It seems to contain episode_reward_mean as an attribute. Is there a reason why an attribute would show up in the results dict and not be recognized by the checkpoint_score_attr parameter?

arturn · April 19, 2022, 8:55pm

Can your reduce your original script to something that I can run? That would be great!

rohin_dasari · April 19, 2022, 10:03pm

Sorry, in the previous example there was an error. I am actually able to reproduce the issue in this small script.

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v3

# Based on code from github.com/parametersharingmadrl/parametersharingmadrl

if __name__ == "__main__":
    def env_creator(args):
        return PettingZooEnv(waterworld_v3.env())

    env = waterworld_v3.env()
    env.reset()
    register_env("waterworld", env_creator)

    result = tune.run(
        "PPO",
        stop={"training_iteration": 5},
        config={
            # Enviroment specific
            "env": "waterworld",
            # General
            "num_gpus": 0,
            "num_workers": 1,
            # Method specific
            "multiagent": {
                "policies": set(env.agents),
                "policy_mapping_fn": (lambda agent_id, episode, **kwargs: agent_id),
            },
        },
        checkpoint_score_attr="episode_reward_mean",
        keep_checkpoints_num=2
    )
    print(result)

In my previous example, I had set checkpoint_freq=1 in the call to tune.run. This was saving checkpoints every training step. Once I removed it, I expected to continue to see checkpoints based on the checkpoint_score_attr attribute, but I don’t see any.

arturn · April 20, 2022, 2:03pm

Hey @rohin_dasari ,

try updating your config with the following:

num_samples=3,
metric="episode_reward_mean",
mode="max",
checkpoint_score_attr="episode_reward_mean",
keep_checkpoints_num=2,
checkpoint_freq=1,

So that the complete script looks similar to this one:

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v3

# Based on code from github.com/parametersharingmadrl/parametersharingmadrl

if __name__ == "__main__":
    def env_creator(args):
        return PettingZooEnv(waterworld_v3.env())

    env = waterworld_v3.env()
    env.reset()
    register_env("waterworld", env_creator)

    result = tune.run(
        "PPO",
        stop={"training_iteration": 3},
        config={
            # Enviroment specific
            "env": "waterworld",
            # General
            "num_gpus": 0,
            "num_workers": 1,
            # Method specific
            "multiagent": {
                "policies": set(env.agents),
                "policy_mapping_fn": (lambda agent_id, episode, **kwargs: agent_id),
            },
        },
        num_samples=3,
        metric="episode_reward_mean",
        mode="max",
        checkpoint_score_attr="episode_reward_mean",
        keep_checkpoints_num=2,
        checkpoint_freq=1,
    )
    print(result)

After running the three trials (num_samples=3!), you will be able to see that each trial has two checkpoints in separate directories.

metric and mode are not needed here, you basically simply forgot to set checkpoint_freq != 0.

Hope this helps!
CC: @Lars_Simon_Zehnder

rohin_dasari · April 20, 2022, 2:57pm

Ah, yes this works as expected! I didn’t realize checkpoint_freq needed to be set to a non-zero value, but that makes sense. Thanks!

Topic		Replies	Views
Use `checkpoint_score_attr` with custom metric Ray Tune	3	509	May 11, 2022
Store best checkpoints according to evaluation metrics Checkpointing, Restoring	0	384	June 19, 2023
Saving checkpoints with good custom_metric using tune.run() Ray Tune	18	2301	July 20, 2021
Custom metrics over evaluation only RLlib	8	1788	December 16, 2021
[Rllib, Tune, AIR] Checkpointing as per custom metric minimum Checkpointing, Restoring	5	57	July 2, 2025

Which attributes can be used in `checkpoint_score_attr` when using `tune.run`

Related topics