Hierarchical hyperparameter optimization

Imagine that you want to create a convolutional neural network, and you want to tune the number of layers and the filter size in each layer. In this case, the number of hyperparameters varies depending on the number of layers. Does Ray Tune provide a natural way to express this?

One way to do this would be to have a config like

{
  "num_layers": tune.randint(1, 4),
  "layer1_filter_h": tune.randint(1, 5),
  "layer1_filter_w": tune.randint(1, 5),
  ...,
  "layer3_filter_w": tune.randint(1, 5)
}

but then many of the hyperparameters will be unused when num_layers is smaller than 3.

Hi @rshin, great question! Currently there is no “natural” way to do this in Ray Tune, i.e. we don’t have utilities for that. However, since our search space definition allows dependent objects with tune.sample_from, we can build this behavior with numpy.

Basically, you first sample a number of layers and then sample an array of that size. Here is an example:

from ray import tune
from ray.tune.suggest.variant_generator import generate_variants

import numpy as np

tune_config = {
    "num_layers": tune.randint(1, 4),
    "filter_h": tune.sample_from(
        lambda spec: np.random.randint(1, 5, size=spec.config.num_layers)),
    "filter_w": tune.sample_from(
        lambda spec: np.random.randint(1, 5, size=spec.config.num_layers))
}

# This generates 10 variants
for _ in range(10):
    for _, variant in generate_variants(dict(config=tune_config)):
        config = variant["config"]
        print(f"Config: {config}")

        # This code will be in the training function to read the parameters
        for layer in range(config["num_layers"]):
            w = config["filter_w"][layer]
            h = config["filter_h"][layer]
            print(f"Layer 1 width={w} height={h}")

Please note though that this will only work with random and grid search and not with custom search algorithms. It will work however with schedulers like ASHA and PBT.

Does this help?

2 Likes

Yes, that makes sense! Thank you for the code example.

I understand that with random sampling of certain parameters that can depend on others’, we can effectively get the hierarchical behavior. However, as you mentioned, this approach only works with schedulers. Do you know if any of the search algorithms supported by Ray Tune can natively handle these kinds of search spaces, and if so, whether there are any plans to add support for it in Ray Tune in the future?