Can't restore trained model when training a new one

bmanczak · May 17, 2022, 8:39am

Hi all,

I successfully trained a PPO agent (let’s call the agent Jerry) for part of my task and the other was handled by computation heavy simulator (Scenario 1).
Now I want to use Jerry (weights frozen) for the first part of the task and train a PPO agent for the second half of the task (Scenario 2). Note that both parts happen at single time-step.

The problem is that when I run ray.tune.run for Scenario 2, in the init of the custom model I try to restore Jerry. When the line agent = ppo.PPOTrainer(env=Grid_Gym, config = config_params) is encountered everything freezes.

My hypothesis is that the number of workers are at the heart of the issue. For Scenario 2 I supply a certainnum_workers but then there are no available workers to restore Jerry.

Help would be greatly appreciated as it blocks me from using Ray to solve the task and I cannot find a way around it cc. @sven1977.

Topic		Replies	Views
Restoring RLlib Run Using Tuner.restore RLlib	5	625	February 17, 2024
Restore agent and continue training with tune.run() RLlib	2	609	July 6, 2021
RLLib Multiagent: Load only one policy from checkpoint & Compatibility of RLLib/Tune Checkpoints RLlib	9	3297	November 24, 2021
Raise NotImplementedError when I try to restore the best trained agent RLlib	2	599	June 3, 2021
RLlib: using evaluation workers on previously trained models RLlib	7	2241	December 8, 2022

Can't restore trained model when training a new one

Related topics