When reporting the results to the tuner using train.report() AND the via normal function return, the tuner seems to read from the return instead of the train.report(), thus keep on giving me the error that the tuner cannot find the metric specified in tune_config of my tuner.
How can I have the tuner read from the report instead of the function return?
(The error goes away if I comment out the line return output.)
################################################################################
##### Last few lines of my training function #####
################################################################################
## Create a Ray Tune session report
## Passes the checkpoint data to Ray Tune
report = {
"loss": np.mean(track_validation_loss[epoch, :]),
}
train.report(report, checkpoint = checkpoint_from_storage)
## Collect all the items into dictionary to return
## Update this into a 2D matrix to be able to track epoch and batch
output = {
"Training Loss": track_training_loss,
"Training TP": track_training_TP_count,
"Training FP": track_training_FP_count,
"Training TN": track_training_TN_count,
"Training FN": track_training_FN_count,
"Validation Loss": track_validation_loss,
"Validation TP": track_validation_TP_count,
"Validation FP": track_validation_FP_count,
"Validation TN": track_validation_TN_count,
"Validation FN": track_validation_FN_count,
}
return output
################################################################################
##### Tuner definition #####
################################################################################
## Tuner
tuner = tune.Tuner(
tune.with_resources(
tune.with_parameters(train_the_model), # Tuner will use what is in param_space
#resources = {"cpu": psutil.cpu_count(logical=True)}, # Logical CPU units - This would oversubscribe and cause low CPU utilization
resources = {"cpu": psutil.cpu_count(logical=False), # Physical CPU units
"gpu": torch.cuda.device_count()},
),
tune_config = tune.TuneConfig(
metric="loss", # Can also put under scheduler
mode="min", # Can also put under scheduler
scheduler=scheduler,
num_samples=10, #
),
param_space=param_space["params"]
)
## Fit the tuner
results = tuner.fit()
