Use `checkpoint_score_attr` with custom metric

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

USING RAY==1.8.0 (cannot upgrade for now)

I’m trying to use the checkpoint_score_attr param in tune.run() but I’m having some difficulties understanding when and how it works. From what I read, I should set checkpoint_score_attr=metric where metric is a key in dictionary returned by tune.run().results. How can I add things to this result dictionary?

Another question: how can I tell tune to look at the evaluation metric and not at the training one?

Hey @fedetask, can you share what your Trainable looks like?

In general, if you’re using a function Trainable then you can report the metric with tune.report(**kwargs). With this, you can pass in the evaluation metric, the training metric, or both!

Let me know if this documentation helps!

I should have specified, I’m using Tune to train RLlib agents. Therefore I do something like

config = {
    'callbacks': MyCallback,  # Computes useful metrics during training
    'evaluation_config': {
        'callbacks': MyEvaluationCallback  # Computes useful metrics in evaluation episodes
    }
}
tune.run(run_or_experiment=DQNTrainer, config=config)

(I skipped non-relevant configuration elements such as environment, stopping criteria, etc)

The metrics that I would like Tune to consider when keeping the best checkpoints are computed by MyEvaluationCallback, which looks like this

class MyEvaluationCallback:

   # ... other methods that save info in episode.user_data

    def on_episode_end(self,
                       *,
                       worker: RolloutWorker,
                       base_env: BaseEnv,
                       policies: Dict[PolicyID, Policy],
                       episode: MultiAgentEpisode,
                       env_index: Optional[int] = None,
                       **kwargs) -> None:
        episode.custom_metrics = # Useful metrics

Should I add a tune.report() call inside on_episode_end()?

Ah gotcha, in that case you should be able to store the (evaluation) metric as part of custom_metrics and then reference it forcheckpoint_score_attr.

See Callbacks and Custom Metrics for some more info and examples!