Offline RL with Cross Validation for Hyperparameter Search


I am running an offline RL experiment using Ray RLLIB. Since my dataset is small I am trying to do nested-cross validation. Doing the outer fold is simple and I have serialized the train/test folds into different offline .json experiences. However, doing cross-validation on the outer train fold is hard. Is there currently a way to implement hyperparameter search with cross-validation in ray tune (without only using gridsearch and the repeater class, such as here)? That is, for hyperparameter tuning, I would like to use portions of the outer training for inner training and some portions of the outer training for inner validation, and these splits would change all within the same single hyperparameter search iteration.

I could serialize multiple datasets for the multiple K inner training and inner validation folds, but using something like bayesian optimization willl result in different hyperparameter searches for the resulting different inner training and inner validation splits. In doing so, it prevents running trials separately and then aggregating over the results for the hyperpameter configurations?

Any help is appreciated.