Tuning xgboost with early stopping

royalts · November 26, 2020, 7:42am

I’m trying to perform hyperparameter tuning for an xgboost classifier. I gather there are at least two ways to do this and I’m trying to figure out what exactly they do and how they might differ:

(1) tune.run() with the native xgboost interface (https://docs.ray.io/en/latest/tune/tutorials/tune-xgboost.html)
(2) TuneSearchCV using the sklearn interface to xgboost (https://github.com/ray-project/tune-sklearn/blob/master/examples/xgbclassifier.py).

How exactly does early stopping work in either of those cases? Are there differences between these two implementations?

If, say, I run (2) with a classifier with n_estimator=500 and set early_stopping=True and n_trials=20 then tune will run 20 parameter combinations but not for the full 500 boosting rounds, correct?

But what determines after how many rounds the score will be checked and which trials will be prematurely abandoned? Is there a way for me to see after the fact how each of these trials was handled, i.e. when it was killed and why?

rliaw · November 26, 2020, 8:05am

Hey @royalts thanks for dropping by!

In tune.run case, early stopping will occur between boosting rounds (assuming you use the xgboost callback in Tune). The early stopping decision will be made by with any Tune scheduler you choose.
in tune-sklearn, we actually implement incremental fitting for xgboost models. The decision for stopping by default is with ASHA, but you can provide any arbitrary Tune scheduler. You can set TuneSearchCV(verbose=... to see how/when decisions are made.

Does that help?

royalts · November 26, 2020, 8:29am

That helped a lot! Mostly consolidated the understanding I had cobbled together from various docs. I hope I can ask some more clarifying questions:

So in (2) each trial will run xgboost for m boosting rounds, then check the score (which score on which dataset? mean test?) and then make a termination decision. If never terminated the trial will run until n_estimator rounds? How do I set m?
Is there a way to get at the trial histories through things other than verbose logs?

royalts · November 26, 2020, 5:03pm

Hmm, perhaps that lingo was a bit too xgboost-specific. What I’m interested in is how to set the batch size for each of the incremental fits and how to set the maximal total number of rounds.

rliaw · November 28, 2020, 6:14pm

Hmm,

To set “m”, you can set parameters for your scheduler
If never terminated, yes I think the trial will run until n_estimator rounds.
To get trial history, you can set a logger (see the TuneSearchCV(loggers=…)) parameter.

Hopefully that helps!

Topic		Replies	Views
Hot to use together ray tune+ xgboost+cross_val_score+early stopping Ray Tune	2	806	July 19, 2021
TuneGridSearchCV only running one trial at a time (not using multiple GPUs and CPUs) Configure Algorithm, Training, Evaluation, Scaling	0	299	May 28, 2023
Early termination?	2	535	January 1, 2021
Tune xgboost with cross-validation? Ray Tune	2	808	September 22, 2021
[TUNE.RUN]each trial ever run for max 10 iteration Ray Tune	2	331	May 31, 2022

Tuning xgboost with early stopping

Related topics