[Tune Class API + Pytorch] Custom metrics are not properly passed to ExperimentAnalysis and Tensorboard

lena-schwert · December 17, 2020, 11:03am

I am using step()to return my losses like this:

return {'train_epoch_loss': train_epoch_loss.detach().numpy(),
        'validate_epoch_loss': validate_epoch_loss.detach().numpy()}

While the experiment runs, both metrics appear in the CLIReporter together with the standard metrics such as the hyperparameters or time_this_iter_s.

Once the trials terminate, both metrics also appear in progress.csv and result.json, however they do not appear in “Scalars” or “Hparams” in Tensorboard (only in “Distributions”) and calling analysis.get_best_config(metric = 'validate_epoch_loss', mode = 'min', scope = 'last')) only returns:

WARNING experiment_analysis.py:557 -- Could not find best trial. Did you pass the correct metric parameter?

Any idea how to resolve this?

Ray version: 1.0.1.post1
PyTorch version: 1.7.0

smorad · February 15, 2021, 5:02pm

Possibly related, in my Torchv2CustomModel subclass, metrics does not appear to work:

class MyModel(Torchv2CustomModel):
  bork = 1

def metrics(self):
  return {'bork': self.bork}

Yet there is no bork in tensorboard.

lena-schwert · March 29, 2021, 10:08am

I found my mistake, it was due to the type of my variable validate_epoch_loss. It was a Numpy array and not “just a number” which is apparently internally difficult for ray. Once I accessed the array entry with myarray.item(), ray’s functions accept it as a metric and correctly identify the best config.

Topic		Replies	Views
Could not find best trial. Did you pass the correct `metric` parameter? Ray Tune	3	1436	December 17, 2021
Metric key is not in trial.metric_analysis -- Not able to find best_trial Ray Tune	2	551	June 8, 2022
Hparam tensorboard logging with pytorch? Ray Tune	2	1135	March 2, 2021
DDP tuning not returning loss in from TuneReportCallback	1	432	October 11, 2022
How can I synchronization metrics in `ray.train` valid_loop	15	610	March 1, 2023

[Tune Class API + Pytorch] Custom metrics are not properly passed to ExperimentAnalysis and Tensorboard

Related topics