Hey, thanks, I can reproduce the issue. This will be fixed soon. In the mean time, as a workaround, try not passing the early_stopping argument. That should make TuneSearchCV use the correct scheduler.
I updated tune-sklearn to 0.4.4.
Now, it doesn’t show the message even with early_stopping=True
But now ( I don’t remember if this message was before, I think it wasn’t) , it shows me a :
UserWarning: tune-sklearn implements incremental learning for xgboost models following this: https://github.com/dmlc/xgboost/issues/1686. This may negatively impact performance. To disable, set early_stopping=False.
Also, it shows me:
- The `callbacks.on_trial_result` operation took 1.253 s, which may be a performance bottleneck.
2022-10-06 09:09:40,163 WARNING util.py:220 -- The `process_trial_result` operation took 1.254 s, which may be a performance bottleneck.
2022-10-06 09:09:40,169 WARNING util.py:220 -- Processing trial results took 1.261 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
2022-10-06 09:09:40,170 WARNING util.py:220 -- The `process_trial_result` operation took 1.262 s, which may be a performance bottleneck.
2022-10-06 09:09:43,151 WARNING util.py:220 -- The `start_trial` operation took 1.314 s, which may be a performance bottleneck.
These messages were there before the tune-sklearn update also.
Hmm… I am receiving messages that I haven’t before :
2022-10-06 09:18:50,164 WARNING util.py:220 -- The `start_trial` operation took 1.321 s, which may be a performance bottleneck.
(_Trainable pid=204883) 2022-10-06 09:21:39,959 INFO trainable.py:668 -- Restored on 10.20.0.57 from checkpoint: /tmp/checkpoint_tmp_tpz_yrdk
(_Trainable pid=204883) 2022-10-06 09:21:39,960 INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 430.8777885437012, '_episodes_total': None}
2022-10-06 09:21:40,158 WARNING util.py:220 -- The `start_trial` operation took 1.618 s, which may be a performance bottleneck.
2022-10-06 09:21:41,795 WARNING util.py:220 -- The `start_trial` operation took 1.636 s, which may be a performance bottleneck.
(_Trainable pid=209771) /home/ggous/miniconda3/envs/sklearn/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
(_Trainable pid=209771) from pandas import MultiIndex, Int64Index
(_Trainable pid=209771) 2022-10-06 09:21:48,120 INFO trainable.py:668 -- Restored on 10.20.0.57 from checkpoint: /tmp/checkpoint_tmp_4l38pd6u
(_Trainable pid=209771) 2022-10-06 09:21:48,120 INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 429.09668040275574, '_episodes_total': None}
The bottleneck messages can be ignored. As the message says, using early stopping with XGBoost may negatively impact performance in some cases, especially for small datasets. BOHB is combined bayesian optimization with early stopping, which means early stopping cannot be turned off. You can choose to use a different search algorithm, eg. optuna or hyperopt, which can work without early stopping.