RAY tune does not save checkpoint information under experiment path

Hi,

I am trying to use ray.tue(2.10.0) with LightGBMTrianer on my MacOS with M3, the key python source about checkpoint configurate is as following:

def lgbm_train(config, train_data, valid_data):
Preformatted texttrainer = LightGBMTrainer(

run_config=RunConfig(

checkpoint_config=CheckpointConfig(

num_to_keep=1,

checkpoint_frequency=1,

checkpoint_score_attribute=“valid-auc”,

checkpoint_score_order=“max”,)),

scaling_config=ScalingConfig(

num_workers=config[“num_workers”],

use_gpu=False,),

label_column=config[“label_column”],

num_boost_round=config[“num_iterations”],

params=config,

datasets={“train”: train_data, “valid”:valid_data}

)

result = trainer.fit()

return {“train-auc”: result.metrics[“train-auc”] ,

“valid-auc”: result.metrics[“valid-auc”] }

stopper = CombinedStopper(

MaximumIterationStopper(max_iter=100),

TimeoutStopper(10*60))

ckpconfig = CheckpointConfig( num_to_keep=1,

checkpoint_frequency=1,

checkpoint_score_attribute=“valid-auc”

checkpoint_score_order=“max”,)

scheduler = ASHAScheduler()

algo = HyperOptSearch()

tuner = tune.Tuner( tune.with_parameters(lgbm_train,train_data=train_dataset, valid_data=valid_dataset),

param_space=config,

tune_config=TuneConfig(

reuse_actors=True,

max_concurrent_trials = 4,

metric=“valid-auc”,

mode=“max”,

scheduler=scheduler,

search_alg=algo,

num_samples=10 ),

run_config=RunConfig(

name=“lgtgbm_tuner”,

stop=stopper,

checkpoint_config = ckpconfig,

),)

results = tuner.fit()

bst_rlt = results.get_best_result(

metric=“valid-auc”, mode=“max”, scope=“last”, filter_nan_and_inf=True)

ckp_data = bst_rlt.get_best_checkpoint(“valid-auc”, “max”).get_metadata()

The about line code failed with:**

RuntimeError: No checkpoint exists in the trial directory!**

df = results.get_dataframe()

print(df)

This output shouws checkpoint_dir_name of every trail is None.

Under the ray_results directory, there is a lgtgbm_tuner directory, I think it is created by tuner

under lgtgbm_tuner there are many trail sub directories which do not contain any checkpoint sub directories,

the name of trail sub directory is something like this:

lgbm_train_28e886a6_8_bagging_fraction=0.9786,boosting_type=gbdt,feature_fraction=0.7612,label_column=target,learning_rate=0.0338,_2024-04-07_10-07-44

But there are also many trail like sub directories under the ray_results top directory, the names are something like LightGBMTrainer_2024-04-07_10-07-03, which has sub directory with names like LightGBMTrainer_86ac2_00000_0_2024-04-07_10-07-03, and it contains the checkpoint directory like checkpoint_000062.

Can you help pointing out what I have missed about checkpoint settings, thanks.