I am not entirely sure if its relevant for your case. If not at least its a nice to know, do you know that the episode_return_mean is smoothed by config.metrics_num_episodes_for_smoothing? See the topic I just have posted:
In short the min/mean/max you obtain by using local_env_runner.get_metrics() are from the last metrics_num_episodes_for_smoothing sampled episodes - not bound to an iteration.
Furthermore (depending on your ray version), restoring metrics is broken, see [RLlib] Checkpoint metrics loading with Tune is broken in 2.47.0 · Issue #53877 · ray-project/ray · GitHub. In your case I think the smoothing from the old episodes (if your windows reaches there) can be off / lost. So possibly you only get the smoothed value from after you loaded the checkpoint.
Maybe you have to cross check if things are maybe correct but not logged like you would expect.
Cheers, and good luck.