When reporting the results to the tuner using train.report()
AND the via normal function return, the tuner seems to read from the return instead of the train.report()
, thus keep on giving me the error that the tuner cannot find the metric specified in tune_config
of my tuner.
How can I have the tuner read from the report instead of the function return?
(The error goes away if I comment out the line return output
.)
################################################################################
##### Last few lines of my training function #####
################################################################################
## Create a Ray Tune session report
## Passes the checkpoint data to Ray Tune
report = {
"loss": np.mean(track_validation_loss[epoch, :]),
}
train.report(report, checkpoint = checkpoint_from_storage)
## Collect all the items into dictionary to return
## Update this into a 2D matrix to be able to track epoch and batch
output = {
"Training Loss": track_training_loss,
"Training TP": track_training_TP_count,
"Training FP": track_training_FP_count,
"Training TN": track_training_TN_count,
"Training FN": track_training_FN_count,
"Validation Loss": track_validation_loss,
"Validation TP": track_validation_TP_count,
"Validation FP": track_validation_FP_count,
"Validation TN": track_validation_TN_count,
"Validation FN": track_validation_FN_count,
}
return output
################################################################################
##### Tuner definition #####
################################################################################
## Tuner
tuner = tune.Tuner(
tune.with_resources(
tune.with_parameters(train_the_model), # Tuner will use what is in param_space
#resources = {"cpu": psutil.cpu_count(logical=True)}, # Logical CPU units - This would oversubscribe and cause low CPU utilization
resources = {"cpu": psutil.cpu_count(logical=False), # Physical CPU units
"gpu": torch.cuda.device_count()},
),
tune_config = tune.TuneConfig(
metric="loss", # Can also put under scheduler
mode="min", # Can also put under scheduler
scheduler=scheduler,
num_samples=10, #
),
param_space=param_space["params"]
)
## Fit the tuner
results = tuner.fit()