Ray Tune PBT - Structural Hyperparameters

I’m using PBT to explore both typical hyperparameters and architectural ones:

config = {
    "lr": tune.uniform(1e-6, 1e-3),
    "weight_decay": tune.uniform(0.2, 0.01),
    "vision_layers": tune.choice([4, 8, 10]),
    "vision_width": tune.choice([32, 64, 128]),
    "vision_patch_size": tune.choice([8, 12]),
}
scheduler = PopulationBasedTraining(
    time_attr="training_iteration",
    metric="val_loss",
    mode="min",
    perturbation_interval=1,
    hyperparam_mutations={
        "lr": tune.loguniform(1e-5, 1e-3),
        "weight_decay": tune.loguniform(1e-6, 1e-4),
    },
)
tuner = tune.Tuner(
    trainable=TrainMetricsPBT,
    run_config=train.RunConfig(
        stop=stop_fn,
        checkpoint_config=train.CheckpointConfig(
            checkpoint_score_attribute="val_loss",
            checkpoint_score_order="min",
            num_to_keep=4,
        ),
    ),
    tune_config=tune.TuneConfig(
        reuse_actors=True,
        scheduler=scheduler,
        max_concurrent_trials=6,
        num_samples=20,
    ),
    param_space=config,
)

Since my concurrency is less than my sample size I expect multiplexing with PBT to explore other structural hyperparameter changes before the usual explore/explore with hyperparameter perbutations. Instead it only seems to sample the full param space up to my max_concurrent_trails of only 6.

Before I try something more complex like using a different scheduler for architecture search before using PBT for strict hyperparameter search I wanted to check that I wasnt’ missing something fundamental here.

Also if I may feedback that trials cloning other trials is super confusing. It would be much clearer if exploit simply terminated the old trial and started a new one with a new ID and everything, but copying the cloned trial checkpoint of course.

Thanks for the guidance.

Nevermind… I upgraded to the latest Ray and PBT is now multiplexing correctly. Thank you!