Hi there,
I am currently training my algorithm creating first a tuner instance and after calling tuner.fit()
def get_tuner(
training_path,
scheduler,
param_space,
stopping_criteria: CombinedStopper,
) -> Tuner:
app_config = Configuration()
tuner = tune.Tuner(
"PPO",
param_space=param_space,
run_config=air.RunConfig(
stop=stopping_criteria,
checkpoint_config=air.CheckpointConfig(
checkpoint_frequency=5, checkpoint_at_end=True
),
progress_reporter=CLIReporter(
max_progress_rows=8,
max_report_frequency=300,
print_intermediate_tables=True,
),
local_dir=app_config["agent"]["histogram"]["results_folder_path"],
name=training_path,
),
tune_config=tune.TuneConfig(scheduler=scheduler, reuse_actors=True),
)
return tuner
# tuner invocation before
tuner.fit()
I am looking for way where after loading a checkpoint and create the algorithm again I can pass this to tuner for doing a train with a different environment configuration or a way were can just transfer the weights of that algorithm to another.
I know how to get the weights but I am stuck in the part to transfer them or call the tune.Tuner() passing the old algorithm.
thank u
Hey @Lorenzo_Delpi , instead of specifying “PPO” to Tuner, can you create an RL trainer instead?
For example:
trainer = RLTrainer(
run_config=RunConfig(stop={"training_iteration": 5}),
scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
algorithm="PPO",
config={
"env": "CartPole-v0",
"framework": "tf",
"evaluation_num_workers": 1,
"evaluation_interval": 1,
"evaluation_config": {"input": "sampler"},
},
resume_from_checkpoint= # <- load checkpoints here
)
Then put this trainer into Tuner?
tuner = tune.Tuner(
trainer,
param_space=param_space,
...
)
hi @yunxuanx
sorry for reply only now, and thank you for your help, I tried your proposed solution but it seems that the weights somehow are not loaded, since I got a very low reward at the start of the iteration, a value that is not compatible with what achieved in the checkpoint
I am loading the checkpoint in the following way where the checkpoint_path is directly pointing to the checkpoint_id folder, not the parent that contains all the checkpoints:
checkpoint_path = "folders/checkpoint_00020"
checkpoint = Checkpoint.from_directory(checkpoint_path)
trainer = RLTrainer(
run_config=RunConfig(stop={"training_iteration": 5}),
scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
algorithm="PPO",
config=dict(alg_config),
resume_from_checkpoint=checkpoint, # <- load checkpoints here
)
I am doing something wrong?? Thank you for your time
Hello @Lorenzo_Delpi and @yunxuanx ,
I am experiencing the same problem. I have an environment with max reward always 2 and a fixed length. When doing the following:
trainer = RLTrainer(
run_config=RunConfig(stop={“training_iteration”: 5}),
scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
algorithm=“PPO”,
config=my_custom_config,
)
result = trainer.fit()
trainer2 = RLTrainer(
run_config=RunConfig(stop={“training_iteration”: 5}),
scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
algorithm=predictor,
config=my_custom_config,
resume_from_checkpoint=result.checkpoint,
)
result2 = trainer2.fit()
When checking the training plots of the results 1 and 2 ( with the following line: resultX.metrics_dataframe.plot(“training_iteration”, “episode_reward_mean”) I see that trainer2 always behaves in the same way as trainer1 (starts having a reward 0 and then arrives to the max reward). I don’t understand why. In my opinion, since trainer2 loads the last checkpoint of trainer1, it should not start with reward 0 but more or less the same reward as trainer1 have (the environment is the same for both trainers).
Could you please help me?
Thanks,
Olivia