Loading experiment analysis from a different machine than the experiment was run with

hello, I am training a PPO agent with RLlib in one machine using windows and then I copy the experiment folder to a different machine with Linux for testing purposes.

To get the analysis object I perform the following operation hoping to be able to get get_best_trial() and get_best_checkpoint() and thenafter be able to build the PPO algorithm from the checkpoint:

analysis_object = ExperimentAnalysis(Linux_experiment_path,
                                         default_metric=metric, 
                                         default_mode=mode)

However, analysis_object path always refers to the original windows path producing errors.

What is the proper workflow in this case?

1 Like

same issue!
Please let me know if you find a solution.

This is a known issue that’s being tracked here: [Train/Tune] Restore an experiment from a different machine/path · Issue #40585 · ray-project/ray · GitHub

Targeting a fix for Ray 2.9, but will keep this thread updated if a nightly is available earlier for you to use. Thanks for raising this issue.

Thank you very much @justinvyu
I am glad is something identified.
I have been trying to solve this for days before advancing since If I started to train models on the remote machine I could not load checkpoints on my local machine and test analyse, etc

1 Like

This should be fixed in the nightly version of ray by this PR: [tune/train] Restore Tuner and results properly from moved storage path by justinvyu · Pull Request #40647 · ray-project/ray · GitHub.

https://docs.ray.io/en/latest/ray-overview/installation.html

Let me know if you get the chance to try it out!

Thank you very much. As soon as I try I will inform here.
Best

1 Like

I face a similar issue. I created all my trials using ray 2.6.1 and could perfectly analyse them. Now I want to hand off the code to others but I see that no one can open my experiments. Even myself, when I create a new virtual environment, I cannot open the experiments anymore. (RuntimeError: Can’t return results as experiment has not been run, yet. Call Tuner.fit() to run the experiment first.)

Is there any workaround to this ? Tried opening the experiments using ray 2.6.1 and also the most recent 2.8.1 - same error each time.

Just re-running the experiments is not an option, so it would be quite painful to lose this data.

For reference, I am using this function to load a given experiment.

def open_validate_ray_experiment(experiment_path, trainable):
    # open & read experiment folder
    print(f"Loading results from {experiment_path}...")
    restored_tuner = tune.Tuner.restore(experiment_path, trainable = trainable, resume_unfinished = False)
    result_grid = restored_tuner.get_results()
    print("Done!\n")

    # Check if there have been errors
    if result_grid.errors:
        print(f"At least one of the {len(result_grid)} trials failed!")
    else:
        print(f"No errors! Number of terminated trials: {len(result_grid)}")
        
    return restored_tuner, result_grid

The PR above should apply to Ray 2.9+. Let me know if you’re able to upgrade and try it out @lassefschmidt