I successfully trained a PPO agent (let’s call the agent Jerry) for part of my task and the other was handled by computation heavy simulator (Scenario 1).
Now I want to use Jerry (weights frozen) for the first part of the task and train a PPO agent for the second half of the task (Scenario 2). Note that both parts happen at single time-step.
The problem is that when I run
ray.tune.run for Scenario 2, in the init of the custom model I try to restore Jerry. When the line
agent = ppo.PPOTrainer(env=Grid_Gym, config = config_params) is encountered everything freezes.
My hypothesis is that the number of workers are at the heart of the issue. For Scenario 2 I supply a certain
num_workers but then there are no available workers to restore Jerry.
Help would be greatly appreciated as it blocks me from using Ray to solve the task and I cannot find a way around it cc. @sven1977.