Complicated Search Spaces with HyperOpt

kevinli · March 1, 2021, 5:03am

I am currently trying to use HyperOpt with a CNN, but some of my architectural hyperparameters have complicated search spaces. When using the default random search for Ray Tune instead of HyperOpt, I can create functions for these search spaces and use tune.sample_from(func), but HyperOpt does not support functional search spaces and I am not able to figure out a way to create the desired search space using the other sampling functions available. For example, one of the hyperparameters is a list of values to be passed into the filter argument of Conv1D, but the length of the list is a random integer between 3 and 7 (representing the number of convolutional blocks) and the values should be nondecreasing and random powers of 2. Would you be able to tune such a hyperparameter using HyperOpt? Thank you!

kai · March 1, 2021, 11:03am

Hi @kevinli,

I thinkyou might be able to achieve something like this with some trickery.

Generally you’d have one variable that samples the length of the list. You will always sample the filter sizes for all possible filters, but ignore those with a too high index. For sampling non-decreasing powers of two, I’d suggest to sample a number between 0-4 or so indicating the additional power added by the filter. So something like this might work:

config = {
    "num_filters": tune.randint(3, 8),
    "filter_1": tune.randint(0, 4),
    "filter_2": tune.randint(0, 4),
    "filter_3": tune.randint(0, 4),
    "filter_4": tune.randint(0, 4),
    "filter_5": tune.randint(0, 4),
    "filter_6": tune.randint(0, 4),
    "filter_7": tune.randint(0, 4)
}

And in the trainable:

def train(config):
    num_filters = config["num__filters"]
    pwr = 0  # Maybe start with a higher pwr here
    for i in range(1, num_filters+1):
        additional_pwr = config[f"filter_{i}"]
        pwr += additional_pwr
        filter_size = 2**pwr
        # Create Conv1D with filter_size

(code untested)

This should be convertable to HyperOpt. Please note though that the sampled variables become dependent in this case (the filter size of the third layer also depends on the second and first layer). This might work with some optimization methods but not with others. I’m not an expert with Tree-Parzen estimators, but I’d advise you to check if the optimization method is still valid given these assumptions.

kevinli · March 1, 2021, 7:56pm

Thanks for the thorough response! This makes a lot of sense and I will definitely look into whether these assumptions will be valid for the optimization methods I use.

ddavo · June 28, 2024, 10:49am

But with this approach, lots of experiments will be run, that might have already been done.

For example, for every experiment with num_filters=2, the results should be the same even if filter_3 or filter_4 changes. Shouldn’t it be better to avoid running those experiments at all?

Topic		Replies	Views
Training a numpy function with hyperopt Ray Tune	0	324	November 1, 2021
How to use non-scalars for points_to_evaluate in Tune / Hyperopt?	3	334	August 22, 2023
Hierarchical hyperparameter optimization Ray Tune	2	583	December 11, 2020
Grid search on space with parameters defined only in certain conditions	0	273	May 11, 2023
How to integrate TorchTrainer with Ray Tune and HyperOpt space?	2	486	February 16, 2023

Complicated Search Spaces with HyperOpt

Related topics