I’m using PBT to explore both typical hyperparameters and architectural ones:
config = {
"lr": tune.uniform(1e-6, 1e-3),
"weight_decay": tune.uniform(0.2, 0.01),
"vision_layers": tune.choice([4, 8, 10]),
"vision_width": tune.choice([32, 64, 128]),
"vision_patch_size": tune.choice([8, 12]),
}
scheduler = PopulationBasedTraining(
time_attr="training_iteration",
metric="val_loss",
mode="min",
perturbation_interval=1,
hyperparam_mutations={
"lr": tune.loguniform(1e-5, 1e-3),
"weight_decay": tune.loguniform(1e-6, 1e-4),
},
)
tuner = tune.Tuner(
trainable=TrainMetricsPBT,
run_config=train.RunConfig(
stop=stop_fn,
checkpoint_config=train.CheckpointConfig(
checkpoint_score_attribute="val_loss",
checkpoint_score_order="min",
num_to_keep=4,
),
),
tune_config=tune.TuneConfig(
reuse_actors=True,
scheduler=scheduler,
max_concurrent_trials=6,
num_samples=20,
),
param_space=config,
)
Since my concurrency is less than my sample size I expect multiplexing with PBT to explore other structural hyperparameter changes before the usual explore/explore with hyperparameter perbutations. Instead it only seems to sample the full param space up to my max_concurrent_trails of only 6.
Before I try something more complex like using a different scheduler for architecture search before using PBT for strict hyperparameter search I wanted to check that I wasnt’ missing something fundamental here.
Also if I may feedback that trials cloning other trials is super confusing. It would be much clearer if exploit simply terminated the old trial and started a new one with a new ID and everything, but copying the cloned trial checkpoint of course.
Thanks for the guidance.