Tuner cannot restore the checkpoints!

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi all,

I am using Tuner and Tuner.fit() to train a PPO agent on my custom env.

I successfully train the agent for some time without any errors. Now I would like to restore the checkpoints for further training; however, it gives me the following error

*** RuntimeError: Could not find Tuner state in restore directory. Did you passthe correct path (including experiment directory?) 

This is how my checkpoint directory looks like:

This is my code:

tuner = tune.Tuner(
            "PPO",
            run_config=run_config,
            param_space=param_space,
        )
   
chkpt_path = "/home/PPO/PPO_MasterEnv_214c3_00000_0_2023-06-01_15-13-42/checkpoint_001300/"

tuner.restore(chkpt_path)

results = tuner.fit()

What is more strange is that I can restore the checkpoint with the train and use the model for prediction, or further training like this:

algo =param_space.build()
chkpt_path = "/home/PPO/PPO_MasterEnv_214c3_00000_0_2023-06-01_15-13-42/checkpoint_001300/"
algo.restore(chkpt_path)
self.algo.train()

But I would like to use the Tuner, and this gives me that error!

Nor the Rllib documentation neither this example shows how to restore the checkpoints for Tuner.

Can anyone help me to fix this?

Thanks!

Hi @deepgravity , to restore a tuner, you have to pass the experiment dir path (/home/PPO), instead of the checkpoint path.

1 Like

Hi @yunxuanx , thank you for your reply. Yes, I had finally figured it out! It is really strange though. If I could remember correctly, in Rax 1.x, we passed the checkpoint dir path!