Parallel Bayesian HP Optimisation

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Dear community,

I have a conceptual question about using Ray for distributed hyperparameter optimization with Bayesian search for any kind of estimator e.g. a Support Vector Machine for binary classification.

Following the examples and the explanation in the documentation on the github page, ray.tune.search.bayesopt seems to be the search algorithm I need. However, I don’t really understand how the parallelization works here.

The objective I want to maximize is the accuracy of the SVM classifier. The bayesopt proposes hyper parameter trials that are used for the training and then evaluated with the scoring function of the SVM. The bayesopt picks per default 10 initial points as a kind of warm-up to fit a gaussian process (GP) to the functional relationship between score and hyperparameters. After that, trials are chosen based on this GP.
My question now is: How many trials are reported in every iteration by the GP before it is updated? If it is only 1, this would mean the execution is almost serial after warm-up because we get 1 trial at a time and only update the GP after we got the score for that trial to pick the next trial. If it is > 1, e.g. 5, we could train the model on these 5 trials in parallel.

Does someone know what is happening there behind the scenes?

Kind regards,
Jan

The search algorithm gets updated every time a trial reports a result. However, trials can still be requested and scheduled in between reported results. Let’s take an example:

Say we want to run 100 trials, each trial needs 1 CPU, and our cluster has 100 CPUs.

We could quickly schedule all 100 trials to run simultaneously, which would basically just randomly sample the search space since the GP has not been fit on anything yet.

This is not very useful, so you’ll usually want to set TuneConfig(max_concurrent_trials) to limit the parallelism and prevent too many trials from being sampled all at once.

For example, I could set max_concurrent_trials=10, in which case the first 10 trials will run with a random config. Then when the first one finishes, it reports the result to the search algorithm, and the searcher will give better samples for subsequent trials.