Hi,
I am trying to use ray.tue(2.10.0) with LightGBMTrianer on my MacOS with M3, the key python source about checkpoint configurate is as following:
def lgbm_train(config, train_data, valid_data):
Preformatted texttrainer = LightGBMTrainer(
run_config=RunConfig(
checkpoint_config=CheckpointConfig(
num_to_keep=1,
checkpoint_frequency=1,
checkpoint_score_attribute=“valid-auc”,
checkpoint_score_order=“max”,)),
scaling_config=ScalingConfig(
num_workers=config[“num_workers”],
use_gpu=False,),
label_column=config[“label_column”],
num_boost_round=config[“num_iterations”],
params=config,
datasets={“train”: train_data, “valid”:valid_data}
)
result = trainer.fit()
return {“train-auc”: result.metrics[“train-auc”] ,
“valid-auc”: result.metrics[“valid-auc”] }
stopper = CombinedStopper(
MaximumIterationStopper(max_iter=100),
TimeoutStopper(10*60))
ckpconfig = CheckpointConfig( num_to_keep=1,
checkpoint_frequency=1,
checkpoint_score_attribute=“valid-auc”
checkpoint_score_order=“max”,)
scheduler = ASHAScheduler()
algo = HyperOptSearch()
tuner = tune.Tuner( tune.with_parameters(lgbm_train,train_data=train_dataset, valid_data=valid_dataset),
param_space=config,
tune_config=TuneConfig(
reuse_actors=True,
max_concurrent_trials = 4,
metric=“valid-auc”,
mode=“max”,
scheduler=scheduler,
search_alg=algo,
num_samples=10 ),
run_config=RunConfig(
name=“lgtgbm_tuner”,
stop=stopper,
checkpoint_config = ckpconfig,
),)
results = tuner.fit()
bst_rlt = results.get_best_result(
metric=“valid-auc”, mode=“max”, scope=“last”, filter_nan_and_inf=True)
ckp_data = bst_rlt.get_best_checkpoint(“valid-auc”, “max”).get_metadata()
The about line code failed with:**
RuntimeError: No checkpoint exists in the trial directory!**
df = results.get_dataframe()
print(df)
This output shouws checkpoint_dir_name of every trail is None.
Under the ray_results directory, there is a lgtgbm_tuner directory, I think it is created by tuner
under lgtgbm_tuner there are many trail sub directories which do not contain any checkpoint sub directories,
the name of trail sub directory is something like this:
lgbm_train_28e886a6_8_bagging_fraction=0.9786,boosting_type=gbdt,feature_fraction=0.7612,label_column=target,learning_rate=0.0338,_2024-04-07_10-07-44
But there are also many trail like sub directories under the ray_results top directory, the names are something like LightGBMTrainer_2024-04-07_10-07-03, which has sub directory with names like LightGBMTrainer_86ac2_00000_0_2024-04-07_10-07-03, and it contains the checkpoint directory like checkpoint_000062.
Can you help pointing out what I have missed about checkpoint settings, thanks.