[Tune] Feature request: Using Ray Tune for K-Fold with Repeated Grid Sampling

I was trying to use Ray Tune together with my K-Fold index k as the Hyperparameter to generate a table of accuracy for comparison across the K-Folds. The k index is used to select the dataset to be used. However, while putting it to Grid Search, it does run each k for once, but with different remaining parameters.

Is it possible to include a new search space API, which repeat each sampled parameter set for each k?

eg.
Instead of…

config = {"p1": tune.loguniform(lr_min, lr_max),
          "p2": tune.choice([1,2,3]),
          "k": tune.grid_search([*range(k)]),}

k = 1, p1=0.2, p2=1
k = 2, p1=0.6, p2=3
k = 3, p1=0.1, p2=3

or

config = {"p1": tune.grid_search([0.1,0.2,0.6]),
          "p2": tune.grid_search([1,2,3]),
          "k": tune.grid_search([*range(k)]),}

k = 1, p1=0.2, p2=1
k = 2, p1=0.2, p2=1
k = 3, p1=0.2, p2=1
k = 1, p1=0.6, p2=3
k = 2, p1=0.6, p2=3
k = 3, p1=0.6, p2=3
k = 1, p1=0.1, p2=3
k = 2, p1=0.1, p2=3
k = 3, p1=0.1, p2=3

suggestion:

config = {"p1": tune.loguniform(lr_min, lr_max),
          "p2": tune.choice([1,2,3]),
          "k": tune.repeated_grid_search([*range(k)]),}

k = 1, p1=0.2, p2=1
k = 2, p1=0.2, p2=1
k = 3, p1=0.2, p2=1
k = 1, p1=0.6, p2=3
k = 2, p1=0.6, p2=3
k = 3, p1=0.6, p2=3
k = 1, p1=0.1, p2=3
k = 2, p1=0.1, p2=3
k = 3, p1=0.1, p2=3

Although, the output of the second case looks the same as the suggestion. But the suggestion example uses random sampling for other parameters, whereas in the second case it uses predefined grid as if we don’t make any real-time results on the choices. Unlike the grid search case, the selection of parameter is not done dynamically due to past results.

Here, in the suggestion example p1 and p2 are still sampled by the algorithm to optimize the tunning.
Also, it would be even better if there is a feature to allow the dynamic hyperparameter sampling to be done based on the average performance across the k-fold.

cc @kai – thoughts about this?

Hi @Calvin, I’ve seen this request before, and we were tracking it here: [tune] Support resolving grid search variables before random samples · Issue #15126 · ray-project/ray · GitHub

I just submitted a PR here that will introduce the functionality, though with a different API: [tune] Add option to keep random values constant over grid search by krfricke · Pull Request #16501 · ray-project/ray · GitHub

Will this solve your request?

1 Like

@kai Yes and no.

It would be great that the repeated sampling can be solved in the future, but in addition to that, I am also suggesting the search can make use of the average performance over the anchored parameter k during the search. This seems to be an independent feature on top of the previous thread.

I am new to Ray Tune. Correct me if I am wrong, I thought Ray Tune sample the parameters dynamically during the search based on the performance returned from finished hyperparameter set. In the other word, if we can compare the average performance of all k-folds before proceeding to sampling the next set of randomly selected hyperparameters, the results should hopefully be more reliable.