Ray Tune error resuming training from an AIR checkpoint

Hello Folks,

I’m using the TuneBOHB search, HyperBandForBOHB scheduler, a ConcurrencyLimiter and the following RunConfig and search space config. The config affects the model structure.

config={
            "layer_1": tune.choice([32, 64, 128]),
            "layer_2": tune.choice([64, 128, 256]),
            "lr": tune.loguniform(1e-4, 1e-1),
            "batch_size": tune.choice([32, 64, 128]),
        }

RunConfig(
            name=exp_name,
            verbose=2,
            storage_path="./ray_results",
            log_to_file=True,
            checkpoint_config=CheckpointConfig(
        num_to_keep=2,
        checkpoint_score_attribute="ptl/val_accuracy",
        checkpoint_score_order="max",),)

I have an error when the trainer is restarting from a Paused state.

Resuming training from an AIR checkpoint. ...
....
Restoring states from the checkpoint path at /var/folders/ws/vv4c_tgx1bn762pg20bds4c80000gn/T/checkpoint_tmp_a55dc2a3602246e7a9960b51c415fb7c/model
.........
RuntimeError: Error(s) in loading state_dict for Classifier:
	size mismatch for layer_2.weight: copying a param with shape torch.Size([64, 128]) from checkpoint, the shape in current model is torch.Size([256, 128]).
	size mismatch for layer_2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([256]).

The config for the model is changing. The code is from the example code with a different scheduler and search algorithm.

Any advice is welcome. Thank you