Which attributes can be used in `checkpoint_score_attr` when using `tune.run`

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

I am using the tune API and I would like to checkpoint the top 5 models according to a performance metric. I believe this can be done using the checkpoint_score_attr and keep_checkpoints_num parameters in the tune.run api. So far my code looks like this:

    results = tune.run(
            'PPO',
            config=config,
            stop={'training_iteration': 25000},
            checkpoint_at_end=True,
            checkpoint_score_attr='episode_reward_mean',
            keep_checkpoints_num=5
            )

With this code, however, no checkpoints are being saved, which may indicate that the parameter passed to checkpoint_score_attr is invalid in some way. Is there some way to know which values are valid for the trainer I am using and is there any reason why the checkpoints aren’t being saved as is?

Hey @rohin_dasari,

I’m opening a PR to throw something more informative here.
In the meantime: if you change the stopping criteria to something invalid, tune will tell you what the results dict looks like and from there you can read the keys that will also be the ones to use with checkpoint_score_attr.

Hi @rohin_dasari ,

I tested this on one of my projects and this appears to run fine for me with checkpoint_score_attr. I tested with ray 1.11.0. I set keep_checkpoints_num=5 and the checkpoint_score_attr="episode_reward_mean" and it stores 5+1 checkpoints (the first one gets always stored). Only difference was I set local_dir="path/to/folder (I don’t believe this is the cause of your issue).

Hey Lars,

Yes, the example runs fine on master, too. But incase you use something like checkpoint_score_attr='episode_reward_means' there, you won’t get an error telling you the actual available dict keys.

PR is open.

1 Like

Hi Artur,
got it. As far as I see Rohin’s issue he wrote the episode_reward_mean correct and did not get any checkpoints. Was this due to some other error already resolved?

Tbh I would like to see a script that produces the error if this is an actual issue. @rohin_dasari
This could just be an issue with the algorithm not producing a proper results dict. But until we can reproduce, that’s just a guess.

Strange. I ran the following example code to try and reproduce the problem:

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v3

# Based on code from github.com/parametersharingmadrl/parametersharingmadrl

if __name__ == "__main__":
    def env_creator(args):
        return PettingZooEnv(waterworld_v3.env())

    env = waterworld_v3.env()
    env.reset()
    register_env("waterworld", env_creator)

    tune.run(
        "PPO",
        stop={"training_iteration": 5},
        checkpoint_freq=1,
        config={
            # Enviroment specific
            "env": "waterworld",
            # General
            "num_gpus": 0,
            "num_workers": 1,
            # Method specific
            "multiagent": {
                "policies": set(env.agents),
                "policy_mapping_fn": (lambda agent_id, episode, **kwargs: agent_id),
            },
        },
        checkpoint_score_attr="episode_reward_mean",
        keep_checkpoints_num=2,
    )

When I ran this, the checkpoints showed up as expected. But for some reason it’s still not working with my original script and I’m having trouble identifying the root cause. After every training iteration, the results dict gets printed out:

agent_timesteps_total: 8000
  custom_metrics: {}
  date: 2022-04-19_12-04-21
  done: false
  episode_len_mean: 58.80597014925373
  episode_media: {}
  episode_reward_max: 116.0
  episode_reward_mean: -25.432835820895523
  episode_reward_min: -234.0
  episodes_this_iter: 67
  episodes_total: 67
...

It seems to contain episode_reward_mean as an attribute. Is there a reason why an attribute would show up in the results dict and not be recognized by the checkpoint_score_attr parameter?

Can your reduce your original script to something that I can run? That would be great! :slight_smile:

Sorry, in the previous example there was an error. I am actually able to reproduce the issue in this small script.

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v3

# Based on code from github.com/parametersharingmadrl/parametersharingmadrl

if __name__ == "__main__":
    def env_creator(args):
        return PettingZooEnv(waterworld_v3.env())

    env = waterworld_v3.env()
    env.reset()
    register_env("waterworld", env_creator)

    result = tune.run(
        "PPO",
        stop={"training_iteration": 5},
        config={
            # Enviroment specific
            "env": "waterworld",
            # General
            "num_gpus": 0,
            "num_workers": 1,
            # Method specific
            "multiagent": {
                "policies": set(env.agents),
                "policy_mapping_fn": (lambda agent_id, episode, **kwargs: agent_id),
            },
        },
        checkpoint_score_attr="episode_reward_mean",
        keep_checkpoints_num=2
    )
    print(result)

In my previous example, I had set checkpoint_freq=1 in the call to tune.run. This was saving checkpoints every training step. Once I removed it, I expected to continue to see checkpoints based on the checkpoint_score_attr attribute, but I don’t see any.

Hey @rohin_dasari ,

try updating your config with the following:

num_samples=3,
metric="episode_reward_mean",
mode="max",
checkpoint_score_attr="episode_reward_mean",
keep_checkpoints_num=2,
checkpoint_freq=1,

So that the complete script looks similar to this one:

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v3

# Based on code from github.com/parametersharingmadrl/parametersharingmadrl

if __name__ == "__main__":
    def env_creator(args):
        return PettingZooEnv(waterworld_v3.env())

    env = waterworld_v3.env()
    env.reset()
    register_env("waterworld", env_creator)

    result = tune.run(
        "PPO",
        stop={"training_iteration": 3},
        config={
            # Enviroment specific
            "env": "waterworld",
            # General
            "num_gpus": 0,
            "num_workers": 1,
            # Method specific
            "multiagent": {
                "policies": set(env.agents),
                "policy_mapping_fn": (lambda agent_id, episode, **kwargs: agent_id),
            },
        },
        num_samples=3,
        metric="episode_reward_mean",
        mode="max",
        checkpoint_score_attr="episode_reward_mean",
        keep_checkpoints_num=2,
        checkpoint_freq=1,
    )
    print(result)

After running the three trials (num_samples=3!), you will be able to see that each trial has two checkpoints in separate directories.

metric and mode are not needed here, you basically simply forgot to set checkpoint_freq != 0.

Hope this helps!
CC: @Lars_Simon_Zehnder

Ah, yes this works as expected! I didn’t realize checkpoint_freq needed to be set to a non-zero value, but that makes sense. Thanks!