TuneSearchCV with TransformedTargetRegressor

I want to tune an sklearn ElasticNet model which is nested within a nested sklearn pipeline. More specifically, my trainable looks as follows:

from sklearn.linear_model import ElasticNet
from sklearn.preprocessing import RobustScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor
from ray.tune.sklearn import TuneSearchCV

pipeline = Pipeline([
    ('scaler', RobustScaler()),
    ('estimator', ElasticNet(, max_iter=100))
])
        
ELN = Pipeline([
    ('model', TransformedTargetRegressor(regressor=pipeline, transformer=RobustScaler()))
])

I want to tune ELN with ray.tune, i.e.:

ELN_tune = TuneSearchCV(
            estimator=ELN, 
            scoring='neg_mean_squared_error',
            param_distributions=pgrid, 
            cv=pds, 
            search_optimization='random', 
            early_stopping=True, 
            n_trials=4, 
            max_iters = 10, 
            random_state=123, 
            verbose=2,
        )

However, trying to tune yields the following error:

ValueError: Early stopping is not supported because the estimator does not have partial_fit, does not support warm_start, or is a tree classifier. Set early_stopping=False.

Yet, ElasticNet() does in fact support warm_start. Also, when I use the sklearn tuning equivalent (e.g. sklearn.model_selection.GridSearchCV), I get no error.

How do I solve this issue? It seems like Ray checks wether the estimator, in this case the TransformedTargetRegressor object, has the possibility of employing early stopping/partial fitting. However, the check wrongly fails in this case, as it should not check the TransformedTargetRegressor object, but the pipeline object within it. Thanks in advance.

do you need early stopping though? Can you try without it? i.e. early_stopping=False.

Setting it to false works. However, this is not an appropriate solution. I need to use a scheduler and stop unpromising trials early.

The same problem occurs when using sklearns SGDRegressor or any other estimator in combination with sklearns TransformedTargetRegressor. Put differently: rays sklearn API works as long as you do not transform the output with TransformedTargetRegressor. However, when transforming the output the API fails. I might be wrong but this should not be the case, right?

I guess that the code probably didn’t take this nested pipeline case into consideration.
See code here.

and unit test here.

Would you be interested in making a pull request contribution to the logic there?

Thanks for the reply. It seems to be as you mentioned. I can make a pull request, sure. I will have a look at it later this week/the weekened, when I have more time.

1 Like