Tune Performance issue with LightGBM predict

Hello, I’m wondering if anybody faced a similar performance issues when using LightGBM with Ray Tune.


OS: Windows 10, version 21H2
Hardware: 12 CPUs, 224 GB RAM, 2 GPUs
Python: 3.9.13
Ray: 2.0.1

What is the problem?

I’m using ray to tune a trainable function, that iterates over 3 folds. For each fold, I fit a LightGBM model and report several metrics. The tuning performance degrades immensely when using the LightGBM booster to predict the validation set for each fold.


Approx. train_data shape (4538400, 278)
Approx. val_data shape (1637608, 278)

Trainable Function

def lgm_cv_model(config, data=None, idx=None):
    for _, (train_index, val_index) in enumerate(idx):
        train_data, train_label = data[x_features].iloc[train_index], data[y_target].iloc[train_index]
        val_data, val_label = data[x_features].iloc[val_index], data[y_target].iloc[val_index]
        lgb_train = lgb.Dataset(train_data, label=train_label)
        lgb_val = lgb.Dataset(val_data, label=val_label)

        gbm = lgb.train(

        yhat_val = gbm.predict(val_data)
        y_val = data[y_target].iloc[val_index]
        # Calculate Metrics
        session.report({"best_iter": gbm.best_iteration})

Tuning Code

tune_config = TuneConfig(

run_config = RunConfig(verbose=3)

tuner = Tuner(
    resources={"cpu": 6, "gpu": 1}),

result_grid = tuner.fit()

When running the above code without gbm.predict(val_data) I get the following result:

and when running the code withgbm.predict(val_data) I get the following result:

Could you add wall timers around lgb.train and lgb.predict? You can report that as well as part of metrics for it to show up in trial table.

Another random poking is maybe to look at your machine utilization. Could there be any resource contention?

If you were to run only one trial through lgm_cv_model do you still see the slow down between w/o and w/ lgm.predict?

Hi @xwjiang2010, thanks for your quick response!

I ran the same hyerparameters with only one trial at a time (using 12 CPUs & 2 GPUs), here are the results

With regards to execution time, this extremly underperforms compared to running the same hyperparameters for the same model outside of the Ray Tune framework. I can upload the results for this soon.

It seems like the time cost of prediction is the performance bottleneck. As far as I can tell, LightGBM prediction works by using all available cores during prediction, and something must be going wrong here when using Ray Tune. When running one trial at a time I set num_threads=12, for two simultaneous trials num_threads=6.

How would I go about investigating resource contention ?