TuneSearchCV with TransformedTargetRegressor

F_S · April 4, 2023, 12:16pm

I want to tune an sklearn ElasticNet model which is nested within a nested sklearn pipeline. More specifically, my trainable looks as follows:

from sklearn.linear_model import ElasticNet
from sklearn.preprocessing import RobustScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor
from ray.tune.sklearn import TuneSearchCV

pipeline = Pipeline([
    ('scaler', RobustScaler()),
    ('estimator', ElasticNet(, max_iter=100))
])
        
ELN = Pipeline([
    ('model', TransformedTargetRegressor(regressor=pipeline, transformer=RobustScaler()))
])

I want to tune ELN with ray.tune, i.e.:

ELN_tune = TuneSearchCV(
            estimator=ELN, 
            scoring='neg_mean_squared_error',
            param_distributions=pgrid, 
            cv=pds, 
            search_optimization='random', 
            early_stopping=True, 
            n_trials=4, 
            max_iters = 10, 
            random_state=123, 
            verbose=2,
        )

However, trying to tune yields the following error:

ValueError: Early stopping is not supported because the estimator does not have partial_fit, does not support warm_start, or is a tree classifier. Set early_stopping=False.

Yet, ElasticNet() does in fact support warm_start. Also, when I use the sklearn tuning equivalent (e.g. sklearn.model_selection.GridSearchCV), I get no error.

How do I solve this issue? It seems like Ray checks wether the estimator, in this case the TransformedTargetRegressor object, has the possibility of employing early stopping/partial fitting. However, the check wrongly fails in this case, as it should not check the TransformedTargetRegressor object, but the pipeline object within it. Thanks in advance.

xwjiang2010 · April 4, 2023, 9:38pm

do you need early stopping though? Can you try without it? i.e. early_stopping=False.

F_S · April 5, 2023, 7:13am

Setting it to false works. However, this is not an appropriate solution. I need to use a scheduler and stop unpromising trials early.

The same problem occurs when using sklearns SGDRegressor or any other estimator in combination with sklearns TransformedTargetRegressor. Put differently: rays sklearn API works as long as you do not transform the output with TransformedTargetRegressor. However, when transforming the output the API fails. I might be wrong but this should not be the case, right?

xwjiang2010 · April 5, 2023, 5:02pm

I guess that the code probably didn’t take this nested pipeline case into consideration.
See code here.

and unit test here.

Would you be interested in making a pull request contribution to the logic there?

F_S · April 5, 2023, 5:34pm

Thanks for the reply. It seems to be as you mentioned. I can make a pull request, sure. I will have a look at it later this week/the weekened, when I have more time.

Topic		Replies	Views
Tune-sklearn and early stopping error Ray Tune	1	1167	March 9, 2021
Confused about max_iters parameter in TuneGridSearchCV	1	344	March 28, 2023
How to report results less frequently for TuneGridSearchCV? Ray Tune	1	766	August 10, 2022
Tuning xgboost with early stopping Ray Tune	4	1060	November 28, 2020
TuneGridSearchCV only running one trial at a time (not using multiple GPUs and CPUs) Configure Algorithm, Training, Evaluation, Scaling	0	299	May 28, 2023

TuneSearchCV with TransformedTargetRegressor

Related topics