How to set metric?

asin · April 23, 2021, 6:00pm

I’m trying to setup ray.tune in Kaggle kernel. It kind of works, but I’m not sure that I do it right, I’m getting a bunch of warnings. Could you please clarify some questions?

There are several places where I can set a metric: inside xgb.train, config, search_alg, scheduler or tune.run. Where should I set a metric if I’d like to use logloss in this case? Where should I set mode="min" for the metric? Documentation says that the metric from tune.run will be passed to the search algorithm and scheduler. But what’s about xgb.train, the warning says it uses the default?
How to force tune.run to use cross-validation when calculating the metric?
How can I fix this warning: Parameters: { "n_estimators" } might not be used.?
If I’d like to use all 4 available CPU cores is it enough to set max_concurrent=4 and resources_per_trial={"cpu": 1}? Or do I need to add anything else?
Does placing pd.read_csv inside load_and_train function mean that every train call will read the data again?

%%time
def load_and_train(config: dict):
    data = pd.read_csv('/kaggle/input/bioresponse/train.csv')
    labels = data['Activity']
    data = data.drop(['Activity'], axis=1)
    
    train_x, test_x, train_y, test_y = train_test_split(
        data, labels,
        test_size=0.2,
        random_state=seed,
        stratify=labels
    )
    
    train_set = xgb.DMatrix(train_x, label=train_y)
    test_set = xgb.DMatrix(test_x, label=test_y)
    
    xgb.train(
        config,
        train_set,
        evals=[(test_set, 'eval')],
        verbose_eval=False,
        callbacks=[TuneReportCallback()]
    )

seed = 0
search_space = {
    'n_estimators': tune.randint(100, 1101),
    "objective": "binary:logistic",
    "max_depth": tune.randint(1, 9),
    "min_child_weight": tune.choice([1, 2, 3]),
    "subsample": tune.uniform(0.5, 1.0),
    "eta": tune.loguniform(1e-4, 1e-1)
}

analysis = tune.run(
    load_and_train,
    num_samples=10,
    metric="eval-logloss",
    mode="min",
    config=search_space,
    search_alg=HEBOSearch(
        random_state_seed=seed,
        max_concurrent=4
    ),
    scheduler=ASHAScheduler(
        max_t=10,
        grace_period=1,
        reduction_factor=2
    ),
    resources_per_trial={"cpu": 1},
    local_dir='xgboost',
    verbose=2
)

asin · April 26, 2021, 9:35am

Is there anybody alive?

rliaw · April 26, 2021, 9:36pm

Hey @asin,

Maybe you’d find an easier time by using Tune-sklearn (scikit-learn API for ray tune) GitHub - ray-project/tune-sklearn: A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques. ?

here’s an example:

github.com

ray-project/tune-sklearn/blob/master/examples/xgbclassifier.py

"""
An example training a XGBClassifier, performing
randomized search using TuneSearchCV.
"""

from tune_sklearn import TuneSearchCV
from sklearn import datasets
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

digits = datasets.load_digits()
x = digits.data
y = digits.target
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.2)

# A parameter grid for XGBoost
params = {
    "min_child_weight": [1, 5, 10],
    "gamma": [0.5, 1, 1.5, 2, 5],
    "subsample": [0.6, 0.8, 1.0],

This file has been truncated. show original

it will help simplify the metrics definition
It has cross-validation as implemented
Not sure about the n_estimators parameter warning
Data in tune-sklearn will automatically be dumped into the shared memory object store. So you only need to read once.

asin · May 2, 2021, 7:21pm

Thank you for your answer.
As far as I know Tune-sklearn does not support HEBOSearch. That’s why I have to use tune.run like this. I’ve read all the documentation but did not find cross-validation settings (shuffle, random_state, stratify, n_splits). Where can I find them?

rliaw · May 3, 2021, 10:12pm

It should support HEBOSearch now: tune-sklearn/custom_searcher_example.py at master · ray-project/tune-sklearn · GitHub

Topic		Replies	Views
Train.report, tune.report and session.report does not work with ray.train specifically xgboost_ray? how to report custom metrics to the SearchGenerator? Ray Train	1	506	February 3, 2023
Model training is slower in Ray Tune Ray Tune	8	1117	June 30, 2023
How to define "loss" metric in `tune.run` Ray Tune	1	961	December 5, 2022
Trial returned a result which did not include the specified metric(s) `eval_acc` that `PopulationBasedTraining` expects Ray Tune	2	1562	April 18, 2023
Is tune.report without metric somthing to avoid? Ray Tune	6	1395	August 30, 2022

How to set metric?

Related topics