Hello, I’m wondering if anybody faced a similar performance issue to what I’m currently facing.
System
OS: Windows 10, version 21H2
Hardware: 12 CPUs, 224 GB RAM, 2 GPUs
Python: 3.9.13
Ray: 2.0.1
What is the problem?
I’m using ray to tune a trainable function, that iterates over 3 folds. For each fold, I fit a LightGBM model and report several metrics. The tuning performance degrades immensely when using the LightGBM booster to predict the validation set for each fold.
Data
Approx. train_data shape (4538400, 278)
Approx. val_data shape (1637608, 278)
Trainable Function
def lgm_cv_model(config, data=None, idx=None):
for _, (train_index, val_index) in enumerate(idx):
train_data, train_label = data[x_features].iloc[train_index], data[y_target].iloc[train_index]
val_data, val_label = data[x_features].iloc[val_index], data[y_target].iloc[val_index]
lgb_train = lgb.Dataset(train_data, label=train_label)
lgb_val = lgb.Dataset(val_data, label=val_label)
gbm = lgb.train(
params=config,
train_set=lgb_train,
valid_sets=[lgb_val],
valid_names=["eval"],
)
yhat_val = gbm.predict(val_data)
y_val = data[y_target].iloc[val_index]
# Calculate Metrics
session.report({"best_iter": gbm.best_iteration})
Tuning Code
tune_config = TuneConfig(
search_alg=tune.search.basic_variant.BasicVariantGenerator(random_state=42),
num_samples=10,
)
run_config = RunConfig(verbose=3)
tuner = Tuner(
trainable=tune.with_resources(
tune.with_parameters(
lgm_cv_model,
data=data,
idx=indicies,
),
resources={"cpu": 6, "gpu": 1}),
param_space=search_space,
tune_config=tune_config,
run_config=run_config,
)
result_grid = tuner.fit()
When running the above code without gbm.predict(val_data)
I get the following result:
and when running the code with gbm.predict(val_data)
I runs for more than 1 day. Unfortunately I can only upload one image.