[Tune] Iteration always 3 or 10

kaihaofan · July 28, 2021, 6:45pm

Hi all, I am using Ray tune to tune XGBoost hyperparameters.
However, for my codes my tuning iterations can only reach either 3 or 10 even though I set max_t for 100. Below are my codes and outputs. Most of my codes are from official example so I hope it’s easier to solve.

def train_multi_buyer(config):
    # This is a training function to be passed into Tune
    # Split into train and test set
    X_train, X_test, y_train, y_test = train_test_split(baseline_data, baseline_label, 
        test_size=0.2, stratify = baseline_label)
    # Build input matrices for XGBoost
    train_set = xgb.DMatrix(X_train, label=y_train)
    test_set = xgb.DMatrix(X_test, label=y_test)
    # Train the classifier, using the Tune callback
    # results={}
    xgb.train(
        config,
        train_set,
        evals=[(test_set, "eval")],
        verbose_eval=False,
        callbacks=[TuneReportCheckpointCallback(filename="model.xgb")])

def get_best_model_checkpoint(analysis):
    best_bst = xgb.Booster()
    best_bst.load_model(os.path.join(analysis.best_checkpoint, "model.xgb"))
    auc = analysis.best_result["eval-auc"]
    accuracy = 1. - analysis.best_result["eval-error"]
    logloss = analysis.best_result["eval-logloss"]
    print(f"Best model parameters: {analysis.best_config}")
    print(f"Best model AUC: {auc:.3f}")
    print(f"Best model total accuracy: {accuracy:.3f}")
    print(f"Best model logloss: {logloss:.3f}")
    return best_bst

def tune_xgboost():
    search_space = {
        # You can mix constants with search space objects.
        "objective": "binary:logistic",
        "eval_metric": ["auc", "logloss", "error"],
        "max_depth": tune.randint(2, 11),
        "min_child_weight": tune.randint(1, 11),
        "subsample": tune.uniform(0.5, 1.0),
        "colsample_bytree": tune.uniform(0.5, 1.0),
        "eta": tune.loguniform(1e-3, 4e-1),
        "scale_pos_weight": tune.randint(1, 11),
        "gamma": tune.uniform(0, 0.5)
    }
    # This will enable early stopping of bad trials.
    scheduler = ASHAScheduler(
        max_t=100,  # 100 training iterations
        grace_period=3,
        reduction_factor=4)

    analysis = tune.run(
        train_multi_buyer,
        metric="eval-auc",
        mode="max",
        stop=None,
        # You can add "gpu": 0.1 to allocate GPUs
        resources_per_trial={"cpu": 2},
        config=search_space,
        num_samples=500,
        scheduler=scheduler,
        verbose=1)

    return analysis

Also part of outputs:

Which I assume the iter 3 is caused by my grace period set to 3.
But not sure the reason for other iterations are always 10 not any other numbers.

Thanks in advance for anyone who put efforts to answer this question

rliaw · July 28, 2021, 6:54pm

@kaihaofan You could also just try disabling the Tune scheduler for now?

kaihaofan · July 28, 2021, 9:39pm

Thanks @rliaw for your swift response, sry for my late reply.
When I disabled the Tune scheduler, I got iters all 10.

def train_multi_buyer(config):
    # This is a training function to be passed into Tune
    # Split into train and test set
    X_train, X_test, y_train, y_test = train_test_split(baseline_data, baseline_label, 
        test_size=0.2, stratify = baseline_label)
    # Build input matrices for XGBoost
    train_set = xgb.DMatrix(X_train, label=y_train)
    test_set = xgb.DMatrix(X_test, label=y_test)
    # Train the classifier, using the Tune callback
    # results={}
    xgb.train(
        config,
        train_set,
        evals=[(test_set, "eval")],
        verbose_eval=False,
        callbacks=[TuneReportCheckpointCallback(filename="model.xgb")])

def get_best_model_checkpoint(analysis):
    best_bst = xgb.Booster()
    best_bst.load_model(os.path.join(analysis.best_checkpoint, "model.xgb"))
    auc = analysis.best_result["eval-auc"]
    accuracy = 1. - analysis.best_result["eval-error"]
    logloss = analysis.best_result["eval-logloss"]
    print(f"Best model parameters: {analysis.best_config}")
    print(f"Best model AUC: {auc:.3f}")
    print(f"Best model total accuracy: {accuracy:.3f}")
    print(f"Best model logloss: {logloss:.3f}")
    return best_bst

def tune_xgboost():
    search_space = {
        # You can mix constants with search space objects.
        "objective": "binary:logistic",
        "eval_metric": ["auc", "logloss", "error"],
        "max_depth": tune.randint(2, 11),
        "min_child_weight": tune.randint(1, 11),
        "subsample": tune.uniform(0.5, 1.0),
        "colsample_bytree": tune.uniform(0.5, 1.0),
        "eta": tune.loguniform(1e-3, 4e-1),
        "scale_pos_weight": tune.randint(1, 11),
        "gamma": tune.uniform(0, 0.5)
    }

    analysis = tune.run(
        train_multi_buyer,
        metric="eval-auc",
        mode="max",
        stop=None,
        # You can add "gpu": 0.1 to allocate GPUs
        resources_per_trial={"cpu": 2},
        config=search_space,
        num_samples=20,
        verbose=1)

    return analysis

and outputs:

Piero_Rotolo · May 31, 2022, 11:46am

same issue for me…ever 10 iterations…

kai · May 31, 2022, 1:50pm

Hi @kaihaofan,

the number of training iterations in XGBoost is determined by the num_boost_round parameter: Python API Reference — xgboost 1.6.1 documentation

This is 10 per default. So if you want to train for 100 iterations instead, you’ll have to set this to 100.

The 3 is because of the early stopping - the bad performing trials are stopped after 3 iterations, the good performing are run until end, which is bounded by the num_boost_round parameter.

Does this make sense?

Topic		Replies	Views
[TUNE.RUN]each trial ever run for max 10 iteration Ray Tune	2	331	May 31, 2022
RayTune gets stuck after completing all trials Ray Tune	1	696	February 11, 2022
Tuning XGBoost with PBT Ray Tune	8	1194	April 22, 2021
Tune xgboost with cross-validation? Ray Tune	2	800	September 22, 2021
Model training is slower in Ray Tune Ray Tune	8	934	June 30, 2023

[Tune] Iteration always 3 or 10

Related topics