Re-train algorithm from checkpoint with tuner.fit()

Lorenzo_Delpi · July 5, 2023, 8:49am

Hi there,

I am currently training my algorithm creating first a tuner instance and after calling tuner.fit()

def get_tuner(
    training_path,
    scheduler,
    param_space,
    stopping_criteria: CombinedStopper,
) -> Tuner:
    app_config = Configuration()
    tuner = tune.Tuner(
        "PPO",
        param_space=param_space,
        run_config=air.RunConfig(
            stop=stopping_criteria,
            checkpoint_config=air.CheckpointConfig(
                checkpoint_frequency=5, checkpoint_at_end=True
            ),
            progress_reporter=CLIReporter(
                max_progress_rows=8,
                max_report_frequency=300,
                print_intermediate_tables=True,
            ),
            local_dir=app_config["agent"]["histogram"]["results_folder_path"],
            name=training_path,
        ),
        tune_config=tune.TuneConfig(scheduler=scheduler, reuse_actors=True),
    )

    return tuner

# tuner invocation before
tuner.fit()

I am looking for way where after loading a checkpoint and create the algorithm again I can pass this to tuner for doing a train with a different environment configuration or a way were can just transfer the weights of that algorithm to another.

I know how to get the weights but I am stuck in the part to transfer them or call the tune.Tuner() passing the old algorithm.

thank u

yunxuanx · July 6, 2023, 2:27am

Hey @Lorenzo_Delpi , instead of specifying “PPO” to Tuner, can you create an RL trainer instead?

For example:

trainer = RLTrainer(
    run_config=RunConfig(stop={"training_iteration": 5}),
    scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
    algorithm="PPO",
    config={
        "env": "CartPole-v0",
        "framework": "tf",
        "evaluation_num_workers": 1,
        "evaluation_interval": 1,
        "evaluation_config": {"input": "sampler"},
    },
   resume_from_checkpoint= # <- load checkpoints here
)

Then put this trainer into Tuner?

tuner = tune.Tuner(
    trainer,
    param_space=param_space,
    ...
)

Lorenzo_Delpi · July 18, 2023, 8:21am

hi @yunxuanx

sorry for reply only now, and thank you for your help, I tried your proposed solution but it seems that the weights somehow are not loaded, since I got a very low reward at the start of the iteration, a value that is not compatible with what achieved in the checkpoint

I am loading the checkpoint in the following way where the checkpoint_path is directly pointing to the checkpoint_id folder, not the parent that contains all the checkpoints:

checkpoint_path = "folders/checkpoint_00020"
checkpoint = Checkpoint.from_directory(checkpoint_path)

  trainer = RLTrainer(
        run_config=RunConfig(stop={"training_iteration": 5}),
        scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
        algorithm="PPO",
        config=dict(alg_config),
        resume_from_checkpoint=checkpoint,  # <- load checkpoints here
    )

I am doing something wrong?? Thank you for your time

Olivia_Jullian · July 18, 2023, 1:25pm

Hello @Lorenzo_Delpi and @yunxuanx ,

I am experiencing the same problem. I have an environment with max reward always 2 and a fixed length. When doing the following:

trainer = RLTrainer(
run_config=RunConfig(stop={“training_iteration”: 5}),
scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
algorithm=“PPO”,
config=my_custom_config,
)
result = trainer.fit()
trainer2 = RLTrainer(
run_config=RunConfig(stop={“training_iteration”: 5}),
scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
algorithm=predictor,
config=my_custom_config,
resume_from_checkpoint=result.checkpoint,
)
result2 = trainer2.fit()

When checking the training plots of the results 1 and 2 ( with the following line: resultX.metrics_dataframe.plot(“training_iteration”, “episode_reward_mean”) I see that trainer2 always behaves in the same way as trainer1 (starts having a reward 0 and then arrives to the max reward). I don’t understand why. In my opinion, since trainer2 loads the last checkpoint of trainer1, it should not start with reward 0 but more or less the same reward as trainer1 have (the environment is the same for both trainers).

Could you please help me?

Thanks,

Olivia

Topic		Replies	Views
Another tune after restoring a PPO algorithm Checkpointing, Restoring	2	301	December 15, 2023
Tune as part of curriculum training	25	1139	February 4, 2024
Retraining a loaded checkpoint using Tuner.fit() with different config Ray Tune	6	1258	October 25, 2022
Resuming/extending rllib tune experiments Checkpointing, Restoring	4	440	November 4, 2023
How to run a 'load_checkpoint' loaded trainer inside tune? RLlib	1	793	December 28, 2020

Re-train algorithm from checkpoint with tuner.fit()

Related topics