Checkpointing using the Trainable Class Api and Xgboost


I am attempting to use ray tune with xgboost using the Trainable Class API and am having some trouble with how to implement check pointing.

Here is the code I have so far for the trainable class.

class ANTrainable(Trainable):

    def setup(self, config, data_obj_id=None):
        df = data_obj_id
        length = len(df)
        len_test = int(0.1 * length)
        len_train = length - len_test
        df_train = df.head(len_train)
        df_train = shuffle(df_train)
        y_train = [1 if val > 0 else 0 for val in df_train['counts'].values]
        y_train = np.array(y_train)
        df_train = df_train.drop(['datetime', 'counts'], axis=1)
        X_train = df_train.values
        df_test = df.tail(len_test)
        y_test = [1 if val > 0 else 0 for val in df_test['counts'].values]
        y_test = np.array(y_test)
        self.y_test = y_test

        df_test = df_test.drop(['datetime', 'counts'], axis=1)
        X_test = df_test.values
        self.X_test = X_test
        self.config = config
        self.train_set = xgboost.DMatrix(X_train, y_train)
        self.test_set = xgboost.DMatrix(X_test, y_test)
        self.model = xgboost.Booster()
    def reset_config(self, new_config):
        self.config = new_config
        return True

    def step(self):
        evals_result = {}
        bst = xgboost.train(
            evals=[(self.test_set, "eval")],
            evals_result = evals_result,

        self.model = bst

        return {
            'aucpr': evals_result['eval']['aucpr'][-1],
            'auc': evals_result['eval']['auc'][-1],
            'logloss': evals_result['eval']['logloss'][-1],
            'error': evals_result['eval']['error'][-1],

    def save_checkpoint(self, tmp_checkpoint_dir):
        checkpoint_path = os.path.join(tmp_checkpoint_dir, "model.xgb")
        return tmp_checkpoint_dir

    def load_checkpoint(self, tmp_checkpoint_dir):
        checkpoint_path = os.path.join(tmp_checkpoint_dir, "model.xgb")
        bst = xgboost.Booster()
        self.model = bst.load_model(checkpoint_path)        

And for the and scheduler

          self.config = {
                "tree_method": "hist",
                "objective": "binary:logistic",
                "eval_metric": ["aucpr", "auc", "logloss", "error"],
                "eta": tune.loguniform(1e-4, 1),
                "subsample": tune.uniform(0.1, 1.0),
                "colsample_bytree": tune.uniform(0.1, 1.0),
                "max_depth": tune.randint(3,10), 
                "gamma": tune.loguniform(0.01, 1),
                "min_child_weight": tune.uniform(1, 7),        
        print('Running Tune Step')
        self.analysis =
            tune.with_parameters(ANTrainable, data_obj_id=self.an_model_input_data),
            scheduler = ASHAScheduler(
                max_t=10,  # training iterations

I feel like there should be a way to take advantage of the Tune Xgboost Callbacks but I’m not entirely sure of how to go about it.

Any help would be much appreciated, and if anyone has an example that would be even better!!

Hi @nikhil, the Tune Xgboost Callbacks actually only work with Tune’s functional API and I would recommend using that over the class API. It’ll require just some minor refactoring of your code but instead of the ANTrainable class you would have a function instead like def train(config, checkpoint_dir, data_object_id)

Then when you call xgb.train, you can pass in the TuneReportCheckpointCallback just like in this guide Tuning XGBoost parameters — Ray v1.4.1.

Let me know if this works for you or if you have any other questions!