Can't properly restore result trained with RLlib using Ray.train.Result

Hi! I want to check the corresponding metric for each of my checkpoint saved when training with RLlib and Ray Tune. I’ve refered to the Ray.train.Result api and tried to restore from path. However, there is no “checkpoint_dir_name” in the metric_df restored, and thus the result loading failed.

This can be easily reproduced by following code:

from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune, train
from ray.train import RunConfig, CheckpointConfig

tuner = tune.Tuner(
  "PPO",
  param_space=PPOConfig().environment("CartPole-v1").to_dict(),
  run_config=RunConfig(
    checkpoint_config=CheckpointConfig(checkpoint_at_end=True, checkpoint_frequency=1),
    stop={"training_iteration": 3}
  )
)

best_result = tuner.fit().get_best_result()

# Loading
from ray.train import Result

restored_result = Result.from_path(best_result.path)

The script above will raise KeyError, saying there isn’t “checkpoint_dir_name” in the metrics_df

It seems like the problem is RLlib does not autofill the metric “checkpoint_dir_name” that both train/tune required. By adding a custom callback, this can be dealt with:

class CheckpointCallback(DefaultCallbacks):
    def on_train_result(self, *, algorithm: Algorithm, result: dict, **kwargs) -> None:
        if algorithm._storage:
            result["checkpoint_dir_name"] = algorithm._storage.checkpoint_dir_name

config = PPOConfig().callbacks(CheckpointCallback)

But even with this callback, I still can’t get the ending checkpoint metric.