Hierarchical hyperparameter optimization

rshin · December 8, 2020, 2:02am

Imagine that you want to create a convolutional neural network, and you want to tune the number of layers and the filter size in each layer. In this case, the number of hyperparameters varies depending on the number of layers. Does Ray Tune provide a natural way to express this?

One way to do this would be to have a config like

{
  "num_layers": tune.randint(1, 4),
  "layer1_filter_h": tune.randint(1, 5),
  "layer1_filter_w": tune.randint(1, 5),
  ...,
  "layer3_filter_w": tune.randint(1, 5)
}

but then many of the hyperparameters will be unused when num_layers is smaller than 3.

kai · December 8, 2020, 9:55am

Hi @rshin, great question! Currently there is no “natural” way to do this in Ray Tune, i.e. we don’t have utilities for that. However, since our search space definition allows dependent objects with tune.sample_from, we can build this behavior with numpy.

Basically, you first sample a number of layers and then sample an array of that size. Here is an example:

from ray import tune
from ray.tune.suggest.variant_generator import generate_variants

import numpy as np

tune_config = {
    "num_layers": tune.randint(1, 4),
    "filter_h": tune.sample_from(
        lambda spec: np.random.randint(1, 5, size=spec.config.num_layers)),
    "filter_w": tune.sample_from(
        lambda spec: np.random.randint(1, 5, size=spec.config.num_layers))
}

# This generates 10 variants
for _ in range(10):
    for _, variant in generate_variants(dict(config=tune_config)):
        config = variant["config"]
        print(f"Config: {config}")

        # This code will be in the training function to read the parameters
        for layer in range(config["num_layers"]):
            w = config["filter_w"][layer]
            h = config["filter_h"][layer]
            print(f"Layer 1 width={w} height={h}")

Please note though that this will only work with random and grid search and not with custom search algorithms. It will work however with schedulers like ASHA and PBT.

Does this help?

rshin · December 11, 2020, 4:41am

Yes, that makes sense! Thank you for the code example.

I understand that with random sampling of certain parameters that can depend on others’, we can effectively get the hierarchical behavior. However, as you mentioned, this approach only works with schedulers. Do you know if any of the search algorithms supported by Ray Tune can natively handle these kinds of search spaces, and if so, whether there are any plans to add support for it in Ray Tune in the future?

Topic		Replies	Views
Config for layers and neurons for MLP classifier Ray Tune	2	576	June 14, 2021
Conditioned hyperparam_mutations ray/tune Ray Tune	4	495	March 24, 2021
Tune Layer/Neurons combinations Ray Tune	5	496	June 23, 2021
[AIR] [TUNE] Custom hyperparameter constraints Ray Tune	3	695	February 24, 2023
Grouping variables together in ray tune Ray Tune	3	315	July 14, 2022

Hierarchical hyperparameter optimization

Related topics