I’ve just managed to get ray train working and I now want to track overall progress in the report.
I’m using torchmetrics to calculate an overall MSE for a dataset, how can I do this to produce a single value for a training epoch? I have to use batches for training, so the way I’ve previously done it prior to using ray was to save all outputs/labels into a list to concatenate prior to calculating the metric. e.g.
all_outputs = []
all_labels = []
for x, labels in train_loader:
outputs = model.forward(x, True)
outputs = torch.squeeze(outputs) # network has single output, so squeeze to match labels
all_outputs.append(outputs.detach())
all_labels.append(labels)
all_outputs = torch.cat(all_outputs)
all_labels = torch.cat(all_labels)
mse = torchmetrics.functional.mean_squared_error(all_outputs, all_labels).item()
I have 2 workers running, so currently when I do this I end up with two mse values in the report. Is there anyway to do this so I can get a single value for the entire dataset rather than two values?