Store best checkpoints according to evaluation metrics

I want Ray to store the n best checkpoints according to an evaluation metric. I set the CheckpointConfig as

result = tune.Tuner(
    ...
    checkpoint_config=air.CheckpointConfig(
        checkpoint_frequency=10,
        checkpoint_at_end=True
        num_to_keep=4,
        checkpoint_score_attribute='evaluation/custom_metrics/my_metric
    )
)

but I get the Result dict has no key error, as it seems the evaluation metrics are not present in the result dict. What am I missing? How can I set the checkpoint_score_attribute to use an evaluation metric?