Selecting best checkpoint to keep training in tune

sven1977 · January 25, 2021, 10:33am

O.S.Tov asked this question on the Slack channel.

Please do not use the Slack channel anymore for questions on RLlib! All discussions should be moved here for better searchability and documentation of issues and questions. Thank you.

Hi, I’ve been training a PPO agent in a jupyter notebook using analysis = tune.run(), I’ve added a checkpoint_freq=10 and checkpoint_at_end=True as arguments. I want to load the best agent without using the ‘analysis’ object since this one is only available at the current python session (using the .get_best_trial).I’ve tried to instantiate a new ExperimentAnalysis object by providing the path to the .json file representing that specific run but I cannot use .best_get_trial the same wayThere seems to be a bit of confusion about the whole saving/loading agents procedure, if someone could please clarify the process I think it would be very helpful to the communityThanks

analysis = tune.run(
PPOTrainer,
stop=stop,
config=config,
local_dir=log_dir,
checkpoint_at_end=True,
checkpoint_freq=10,
name='PPO_run_1') checkpoints = analysis.get_trial_checkpoints_paths(
trial=analysis.get_best_trial('episode_reward_mean', mode='max'),
metric='episode_reward_mean') checkpoint_path = checkpoints[0][0] agent = PPOTrainer(config=config, env='TradingEnv')
agent.restore(checkpoint_path)

This is the code I’ve been using so far, if for some reason the python session isn’t available, the “analysis” object cannot be used anymore.

Topic		Replies	Views
Ray restore checkpoint in rllib RLlib	6	1646	August 11, 2021
Restoring the best model without access to the Analysis object Ray Tune	0	274	January 29, 2021
Compute/display actions from ray.tune RLlib	10	1677	March 30, 2021
Empty checkpoint files with Tune.run RLlib	1	387	March 30, 2022
How to make checkpoint by ray.tune.run and load it? RLlib	3	2779	July 7, 2022

Selecting best checkpoint to keep training in tune

Related topics