TuneSearchCV and search optimization

I’m using TuneSearchCV to tune classifiers. Here is a snippet that I’m using-

TuneSearchCV(model,
                    param_distributions=config,
                    n_trials=200,
                    early_stopping=False,
                    max_iters=1,  
                    search_optimization="bayesian",
                    n_jobs=-1,
                    refit=True,
                    cv= StratifiedKFold(n_splits=5,shuffle=True,random_state=42),
                    verbose=0,
                    #loggers = "tensorboard",
                    random_state=42,
                    local_dir="./ray_results" )

Question is: “does search_optimization and n_jobs go together?” in the above snippet.

The way that I interpret is n_jobs=-1 will execute all 200 trials at once and there is no use of search_optimization parameter. So I should be using something like n_jobs=50 and hope that next batch will be picked better based on the results?

Can someone please correct my interpretation?
Thanks!

Hi @tkmamidi, indeed the n_jobs=-1 would allow as many concurrent trials as you have resources available. This obviously depends on your cluster setup, but assuming you are running on a cluster in which 200 trials can be run concurrently, all of them will run at the same time.

In that case, yes, the search optimization is pretty meaningless as all configurations will be sampled at the start of the run and the search algorithm has no additional information to make better guesses.

Thus you may want to limit the number of jobs running in parallel. Depending on your problem, 50 seems fine, though you could also start with 20 or so. Just know that this will obviously impact training time.

By the way, even if you are resource constrained by your cluster, you should still set n_jobs to some number as even if trials cannot be run concurrently, configurations will still be started at the start of the run.

1 Like