How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I implemented a custom tune.Trainable
Class with all the necessary methods: setup()
reset_config()
, …
It is a RL DQN Agent. In the setup()
method I set things like the replay buffer, episodes counter, etc. that I only want to initialize one time. But I encountered that this method gets called multiple times during Population Based Training. Mostly after the cleanup()
methods. I am not sure what the cleanup here, so it is an empty function. Sometimes setup()
is called after reset_config()
where I reset the agents config, but not attributes like the replay buffer or episodes counter. This method returns always True
.
This is how I use the tune.Tuner
:
config = Config('config/config_cartpole_1st_order.yaml').load_config()
hyperparam_mutations = {
'lr': tune.uniform(0.0001, 0.01),
'gamma': tune.uniform(0.9, 0.999),
'tau': tune.uniform(0.001, 0.01),
}
pbt = PopulationBasedTraining(
time_attr='training_iteration',
perturbation_interval=2000,
burn_in_period=2000,
hyperparam_mutations=hyperparam_mutations
)
stopping_criteria = {
'episodes_mean_reward': 200,
'steps': 30_000,
}
tuner = tune.Tuner(
Custom_DQN,
run_config = train.RunConfig(
name='PBT_CartPole',
stop=stopping_criteria,
verbose=1,
),
tune_config = tune.TuneConfig(
scheduler=pbt,
num_samples=2,
metric='episodes_mean_reward',
mode='max',
reuse_actors=True,
),
param_space = config,
)
results = tuner.fit()
print("best hyperparameters: ", results.get_best_result().config)
Only the reset_config()
method should be called after loading a checkpoint.