Re-train algorithm from checkpoint with tuner.fit()

Hi there,

I am currently training my algorithm creating first a tuner instance and after calling tuner.fit()

def get_tuner(
    training_path,
    scheduler,
    param_space,
    stopping_criteria: CombinedStopper,
) -> Tuner:
    app_config = Configuration()
    tuner = tune.Tuner(
        "PPO",
        param_space=param_space,
        run_config=air.RunConfig(
            stop=stopping_criteria,
            checkpoint_config=air.CheckpointConfig(
                checkpoint_frequency=5, checkpoint_at_end=True
            ),
            progress_reporter=CLIReporter(
                max_progress_rows=8,
                max_report_frequency=300,
                print_intermediate_tables=True,
            ),
            local_dir=app_config["agent"]["histogram"]["results_folder_path"],
            name=training_path,
        ),
        tune_config=tune.TuneConfig(scheduler=scheduler, reuse_actors=True),
    )

    return tuner

# tuner invocation before
tuner.fit()

I am looking for way where after loading a checkpoint and create the algorithm again I can pass this to tuner for doing a train with a different environment configuration or a way were can just transfer the weights of that algorithm to another.

I know how to get the weights but I am stuck in the part to transfer them or call the tune.Tuner() passing the old algorithm.

thank u

Hey @Lorenzo_Delpi , instead of specifying “PPO” to Tuner, can you create an RL trainer instead?

For example:

trainer = RLTrainer(
    run_config=RunConfig(stop={"training_iteration": 5}),
    scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
    algorithm="PPO",
    config={
        "env": "CartPole-v0",
        "framework": "tf",
        "evaluation_num_workers": 1,
        "evaluation_interval": 1,
        "evaluation_config": {"input": "sampler"},
    },
   resume_from_checkpoint= # <- load checkpoints here
)

Then put this trainer into Tuner?

tuner = tune.Tuner(
    trainer,
    param_space=param_space,
    ...
)

hi @yunxuanx

sorry for reply only now, and thank you for your help, I tried your proposed solution but it seems that the weights somehow are not loaded, since I got a very low reward at the start of the iteration, a value that is not compatible with what achieved in the checkpoint

I am loading the checkpoint in the following way where the checkpoint_path is directly pointing to the checkpoint_id folder, not the parent that contains all the checkpoints:

checkpoint_path = "folders/checkpoint_00020"
checkpoint = Checkpoint.from_directory(checkpoint_path)

  trainer = RLTrainer(
        run_config=RunConfig(stop={"training_iteration": 5}),
        scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
        algorithm="PPO",
        config=dict(alg_config),
        resume_from_checkpoint=checkpoint,  # <- load checkpoints here
    )

I am doing something wrong?? Thank you for your time

Hello @Lorenzo_Delpi and @yunxuanx ,

I am experiencing the same problem. I have an environment with max reward always 2 and a fixed length. When doing the following:

trainer = RLTrainer(
run_config=RunConfig(stop={“training_iteration”: 5}),
scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
algorithm=“PPO”,
config=my_custom_config,
)
result = trainer.fit()
trainer2 = RLTrainer(
run_config=RunConfig(stop={“training_iteration”: 5}),
scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
algorithm=predictor,
config=my_custom_config,
resume_from_checkpoint=result.checkpoint,
)
result2 = trainer2.fit()

When checking the training plots of the results 1 and 2 ( with the following line: resultX.metrics_dataframe.plot(“training_iteration”, “episode_reward_mean”) I see that trainer2 always behaves in the same way as trainer1 (starts having a reward 0 and then arrives to the max reward). I don’t understand why. In my opinion, since trainer2 loads the last checkpoint of trainer1, it should not start with reward 0 but more or less the same reward as trainer1 have (the environment is the same for both trainers).

Could you please help me?

Thanks,

Olivia