Saving best checkpoint - tune is saving first iterations instead

Hi all, I’m trying to checkpoint only the best iterations of my model, but when I check, only the first 5 checkpoints (because of keep_checkpoint_num=5) and the last one are saved, like so:

checkpoint_010001  checkpoint_010003  checkpoint_010005  events.out.tfevents.1634291196.LAPTOP-7VGTS0VK  params.pkl    result.json
checkpoint_010002  checkpoint_010004  checkpoint_013663  params.json                                     progress.csv

My tune.run call:

        scheduler = AsyncHyperBandScheduler(
            time_attr="training_iteration",
            grace_period=5 * 60,
            max_t=1000000 * 60,
        )

        print("Training automatically with Ray Tune")
        analysis = tune.run(
            args.run,
            config=config,
            stop=stop,
            checkpoint_freq=1,
            keep_checkpoints_num=5,
            checkpoint_score_attr="episode_reward_mean",
            metric="episode_reward_mean",
            mode="max",
            callbacks=[
                WandbLoggerCallback(
                    group=name_run(config, ""),
                    api_key_file=".wandb_api_key",
                    project="egt-rl",
                ),
            ],
            scheduler=scheduler,
            name=name_run(config, ""),
        )

Any idea why this is happening? Intended behavior is saving the 5-best models by episode_reward_mean. Keeping the last one too.

What kind of trainable are you training (or environment if using rllib)? Does your run converge (i.e. are you seeing higher rewards over time)?