Complicated Search Spaces with HyperOpt

I am currently trying to use HyperOpt with a CNN, but some of my architectural hyperparameters have complicated search spaces. When using the default random search for Ray Tune instead of HyperOpt, I can create functions for these search spaces and use tune.sample_from(func), but HyperOpt does not support functional search spaces and I am not able to figure out a way to create the desired search space using the other sampling functions available. For example, one of the hyperparameters is a list of values to be passed into the filter argument of Conv1D, but the length of the list is a random integer between 3 and 7 (representing the number of convolutional blocks) and the values should be nondecreasing and random powers of 2. Would you be able to tune such a hyperparameter using HyperOpt? Thank you!

1 Like

Hi @kevinli,

I thinkyou might be able to achieve something like this with some trickery.

Generally you’d have one variable that samples the length of the list. You will always sample the filter sizes for all possible filters, but ignore those with a too high index. For sampling non-decreasing powers of two, I’d suggest to sample a number between 0-4 or so indicating the additional power added by the filter. So something like this might work:

config = {
    "num_filters": tune.randint(3, 8),
    "filter_1": tune.randint(0, 4),
    "filter_2": tune.randint(0, 4),
    "filter_3": tune.randint(0, 4),
    "filter_4": tune.randint(0, 4),
    "filter_5": tune.randint(0, 4),
    "filter_6": tune.randint(0, 4),
    "filter_7": tune.randint(0, 4)

And in the trainable:

def train(config):
    num_filters = config["num__filters"]
    pwr = 0  # Maybe start with a higher pwr here
    for i in range(1, num_filters+1):
        additional_pwr = config[f"filter_{i}"]
        pwr += additional_pwr
        filter_size = 2**pwr
        # Create Conv1D with filter_size

(code untested)

This should be convertable to HyperOpt. Please note though that the sampled variables become dependent in this case (the filter size of the third layer also depends on the second and first layer). This might work with some optimization methods but not with others. I’m not an expert with Tree-Parzen estimators, but I’d advise you to check if the optimization method is still valid given these assumptions.

Thanks for the thorough response! This makes a lot of sense and I will definitely look into whether these assumptions will be valid for the optimization methods I use.