TuneSearchCV - bohb Are you using HyperBandForBOHB as a scheduler?

Hello ,

I wanted to ask .

According to Scikit-Learn API (tune.sklearn) — Ray 2.0.0

we can choose search_optimization='bohb'

So, I am using this, in my TuneSearchCV and I am receiving:

BOHB Info not detected in result. Are you using HyperBandForBOHB as a scheduler?

So, i am not sure, does it uses the bohb or not?

Thanks

Hey @George, can you show how you call TuneSearchCV?

Hi, yes:


for i, (train_idx, val_idx) in enumerate(kfold.split(x_train, y_train)):
        x_train_, y_train_ = x_train[train_idx, :], y_train[train_idx]
        x_val_, y_val_ = x_train[val_idx, :], y_train[val_idx]

        # Initialize classifier
        xgb_class = xgb.XGBClassifier(objective ='multi:softprob',
                                      use_label_encoder=False,
                                      eval_metric=["merror", "mlogloss", "auc"] , 
                                      seed=123,
                                      enable_categorical=False)

        model = TuneSearchCV(
            xgb_class,
            param_distributions=xgb_params,
            n_trials=20,
            max_iters=10,
            search_optimization='bohb,
            early_stopping=True,
            scoring='f1_micro',
            n_jobs=4,
            name='Ray tune',
            verbose=0,
            local_dir='./ray_results',

            )

        history = model.fit(x_train_,
                            y_train_,
                            eval_metric=["merror", "mlogloss", "auc"],
                            early_stopping_rounds=20,
                            eval_set=[(x_train_, y_train_), (x_val_, y_val_)])

Hey, thanks, I can reproduce the issue. This will be fixed soon. In the mean time, as a workaround, try not passing the early_stopping argument. That should make TuneSearchCV use the correct scheduler.

Fixed in Release tune-sklearn 0.4.4 · ray-project/tune-sklearn · GitHub!

Nice @Yard1 !

I updated tune-sklearn to 0.4.4.
Now, it doesn’t show the message even with early_stopping=True

But now ( I don’t remember if this message was before, I think it wasn’t) , it shows me a :

UserWarning: tune-sklearn implements incremental learning for xgboost models following this: https://github.com/dmlc/xgboost/issues/1686. This may negatively impact performance. To disable, set early_stopping=False.

Also, it shows me:

- The `callbacks.on_trial_result` operation took 1.253 s, which may be a performance bottleneck.
2022-10-06 09:09:40,163	WARNING util.py:220 -- The `process_trial_result` operation took 1.254 s, which may be a performance bottleneck.
2022-10-06 09:09:40,169	WARNING util.py:220 -- Processing trial results took 1.261 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
2022-10-06 09:09:40,170	WARNING util.py:220 -- The `process_trial_result` operation took 1.262 s, which may be a performance bottleneck.
2022-10-06 09:09:43,151	WARNING util.py:220 -- The `start_trial` operation took 1.314 s, which may be a performance bottleneck.

These messages were there before the tune-sklearn update also.

Hmm… I am receiving messages that I haven’t before :


2022-10-06 09:18:50,164	WARNING util.py:220 -- The `start_trial` operation took 1.321 s, which may be a performance bottleneck.
(_Trainable pid=204883) 2022-10-06 09:21:39,959	INFO trainable.py:668 -- Restored on 10.20.0.57 from checkpoint: /tmp/checkpoint_tmp_tpz_yrdk
(_Trainable pid=204883) 2022-10-06 09:21:39,960	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 430.8777885437012, '_episodes_total': None}
2022-10-06 09:21:40,158	WARNING util.py:220 -- The `start_trial` operation took 1.618 s, which may be a performance bottleneck.
2022-10-06 09:21:41,795	WARNING util.py:220 -- The `start_trial` operation took 1.636 s, which may be a performance bottleneck.
(_Trainable pid=209771) /home/ggous/miniconda3/envs/sklearn/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
(_Trainable pid=209771)   from pandas import MultiIndex, Int64Index
(_Trainable pid=209771) 2022-10-06 09:21:48,120	INFO trainable.py:668 -- Restored on 10.20.0.57 from checkpoint: /tmp/checkpoint_tmp_4l38pd6u
(_Trainable pid=209771) 2022-10-06 09:21:48,120	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 429.09668040275574, '_episodes_total': None}

I tried to use :

early_stopping=False , but I am receiving:

early_stopping must not be False when using BOHB

The bottleneck messages can be ignored. As the message says, using early stopping with XGBoost may negatively impact performance in some cases, especially for small datasets. BOHB is combined bayesian optimization with early stopping, which means early stopping cannot be turned off. You can choose to use a different search algorithm, eg. optuna or hyperopt, which can work without early stopping.

1 Like