Why is the setup() method called multiple times during pbt?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I implemented a custom tune.Trainable Class with all the necessary methods: setup() reset_config(), …

It is a RL DQN Agent. In the setup() method I set things like the replay buffer, episodes counter, etc. that I only want to initialize one time. But I encountered that this method gets called multiple times during Population Based Training. Mostly after the cleanup() methods. I am not sure what the cleanup here, so it is an empty function. Sometimes setup() is called after reset_config() where I reset the agents config, but not attributes like the replay buffer or episodes counter. This method returns always True.

This is how I use the tune.Tuner:

config = Config('config/config_cartpole_1st_order.yaml').load_config()

hyperparam_mutations = {
        'lr': tune.uniform(0.0001, 0.01),
        'gamma': tune.uniform(0.9, 0.999),
        'tau': tune.uniform(0.001, 0.01),
    }

    pbt = PopulationBasedTraining(
        time_attr='training_iteration',
        perturbation_interval=2000,
        burn_in_period=2000,
        hyperparam_mutations=hyperparam_mutations
    )

    stopping_criteria = {
        'episodes_mean_reward': 200,
        'steps': 30_000,
    }

    tuner = tune.Tuner(
        Custom_DQN,
        run_config = train.RunConfig(
            name='PBT_CartPole',
            stop=stopping_criteria,
            verbose=1, 
        ),
        tune_config = tune.TuneConfig(
            scheduler=pbt,
            num_samples=2,
            metric='episodes_mean_reward',
            mode='max',
            reuse_actors=True,
        ),
        param_space = config,
    )

    results = tuner.fit()
    print("best hyperparameters: ", results.get_best_result().config)

Only the reset_config() method should be called after loading a checkpoint.