hello, I am training a PPO agent with RLlib in one machine using windows and then I copy the experiment folder to a different machine with Linux for testing purposes.
To get the analysis object I perform the following operation hoping to be able to get get_best_trial() and get_best_checkpoint() and thenafter be able to build the PPO algorithm from the checkpoint:
Thank you very much @justinvyu
I am glad is something identified.
I have been trying to solve this for days before advancing since If I started to train models on the remote machine I could not load checkpoints on my local machine and test analyse, etc
I face a similar issue. I created all my trials using ray 2.6.1 and could perfectly analyse them. Now I want to hand off the code to others but I see that no one can open my experiments. Even myself, when I create a new virtual environment, I cannot open the experiments anymore. (RuntimeError: Can’t return results as experiment has not been run, yet. Call Tuner.fit() to run the experiment first.)
Is there any workaround to this ? Tried opening the experiments using ray 2.6.1 and also the most recent 2.8.1 - same error each time.
Just re-running the experiments is not an option, so it would be quite painful to lose this data.
For reference, I am using this function to load a given experiment.
def open_validate_ray_experiment(experiment_path, trainable):
# open & read experiment folder
print(f"Loading results from {experiment_path}...")
restored_tuner = tune.Tuner.restore(experiment_path, trainable = trainable, resume_unfinished = False)
result_grid = restored_tuner.get_results()
print("Done!\n")
# Check if there have been errors
if result_grid.errors:
print(f"At least one of the {len(result_grid)} trials failed!")
else:
print(f"No errors! Number of terminated trials: {len(result_grid)}")
return restored_tuner, result_grid