TuneSearchCV - bohb Are you using HyperBandForBOHB as a scheduler?

George · October 4, 2022, 1:17pm

Hello ,

I wanted to ask .

According to https://docs.ray.io/en/latest/tune/api_docs/sklearn.html

we can choose search_optimization='bohb'

So, I am using this, in my TuneSearchCV and I am receiving:

BOHB Info not detected in result. Are you using HyperBandForBOHB as a scheduler?

So, i am not sure, does it uses the bohb or not?

Thanks

Yard1 · October 4, 2022, 9:35pm

Hey @George, can you show how you call TuneSearchCV?

George · October 5, 2022, 6:03am

Hi, yes:


for i, (train_idx, val_idx) in enumerate(kfold.split(x_train, y_train)):
        x_train_, y_train_ = x_train[train_idx, :], y_train[train_idx]
        x_val_, y_val_ = x_train[val_idx, :], y_train[val_idx]

        # Initialize classifier
        xgb_class = xgb.XGBClassifier(objective ='multi:softprob',
                                      use_label_encoder=False,
                                      eval_metric=["merror", "mlogloss", "auc"] , 
                                      seed=123,
                                      enable_categorical=False)

        model = TuneSearchCV(
            xgb_class,
            param_distributions=xgb_params,
            n_trials=20,
            max_iters=10,
            search_optimization='bohb,
            early_stopping=True,
            scoring='f1_micro',
            n_jobs=4,
            name='Ray tune',
            verbose=0,
            local_dir='./ray_results',

            )

        history = model.fit(x_train_,
                            y_train_,
                            eval_metric=["merror", "mlogloss", "auc"],
                            early_stopping_rounds=20,
                            eval_set=[(x_train_, y_train_), (x_val_, y_val_)])

Yard1 · October 5, 2022, 4:27pm

Hey, thanks, I can reproduce the issue. This will be fixed soon. In the mean time, as a workaround, try not passing the early_stopping argument. That should make TuneSearchCV use the correct scheduler.

Yard1 · October 5, 2022, 6:20pm

Fixed in Release tune-sklearn 0.4.4 · ray-project/tune-sklearn · GitHub!

George · October 6, 2022, 6:15am

Nice @Yard1 !

I updated tune-sklearn to 0.4.4.
Now, it doesn’t show the message even with early_stopping=True

But now ( I don’t remember if this message was before, I think it wasn’t) , it shows me a :

UserWarning: tune-sklearn implements incremental learning for xgboost models following this: https://github.com/dmlc/xgboost/issues/1686. This may negatively impact performance. To disable, set early_stopping=False.

Also, it shows me:

- The `callbacks.on_trial_result` operation took 1.253 s, which may be a performance bottleneck.
2022-10-06 09:09:40,163	WARNING util.py:220 -- The `process_trial_result` operation took 1.254 s, which may be a performance bottleneck.
2022-10-06 09:09:40,169	WARNING util.py:220 -- Processing trial results took 1.261 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
2022-10-06 09:09:40,170	WARNING util.py:220 -- The `process_trial_result` operation took 1.262 s, which may be a performance bottleneck.
2022-10-06 09:09:43,151	WARNING util.py:220 -- The `start_trial` operation took 1.314 s, which may be a performance bottleneck.

These messages were there before the tune-sklearn update also.

George · October 6, 2022, 6:50am

Hmm… I am receiving messages that I haven’t before :


2022-10-06 09:18:50,164	WARNING util.py:220 -- The `start_trial` operation took 1.321 s, which may be a performance bottleneck.
(_Trainable pid=204883) 2022-10-06 09:21:39,959	INFO trainable.py:668 -- Restored on 10.20.0.57 from checkpoint: /tmp/checkpoint_tmp_tpz_yrdk
(_Trainable pid=204883) 2022-10-06 09:21:39,960	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 430.8777885437012, '_episodes_total': None}
2022-10-06 09:21:40,158	WARNING util.py:220 -- The `start_trial` operation took 1.618 s, which may be a performance bottleneck.
2022-10-06 09:21:41,795	WARNING util.py:220 -- The `start_trial` operation took 1.636 s, which may be a performance bottleneck.
(_Trainable pid=209771) /home/ggous/miniconda3/envs/sklearn/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
(_Trainable pid=209771)   from pandas import MultiIndex, Int64Index
(_Trainable pid=209771) 2022-10-06 09:21:48,120	INFO trainable.py:668 -- Restored on 10.20.0.57 from checkpoint: /tmp/checkpoint_tmp_4l38pd6u
(_Trainable pid=209771) 2022-10-06 09:21:48,120	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 429.09668040275574, '_episodes_total': None}

I tried to use :

early_stopping=False , but I am receiving:

early_stopping must not be False when using BOHB

Yard1 · October 6, 2022, 8:30am

The bottleneck messages can be ignored. As the message says, using early stopping with XGBoost may negatively impact performance in some cases, especially for small datasets. BOHB is combined bayesian optimization with early stopping, which means early stopping cannot be turned off. You can choose to use a different search algorithm, eg. optuna or hyperopt, which can work without early stopping.

Topic		Replies	Views
TuneBOHB does not search Ray Tune	1	469	April 18, 2021
How does Ray Bayesian Optimization HyperBand (BOHB) work?	0	93	May 17, 2024
How ray tune hyperband schedule generates and stop trials?	0	229	May 5, 2023
Running Tune with nonparallel function Ray Tune	3	300	May 21, 2021
How to use max_iters with TuneSearchCV, XGBRegressor, and BOHB RLlib	0	340	June 20, 2023

TuneSearchCV - bohb Are you using HyperBandForBOHB as a scheduler?

Related topics