I’ve just managed to get ray train working and I now want to track overall progress in the report.
I’m using torchmetrics to calculate an overall MSE for a dataset, how can I do this to produce a single value for a training epoch? I have to use batches for training, so the way I’ve previously done it prior to using ray was to save all outputs/labels into a list to concatenate prior to calculating the metric. e.g.
all_outputs =  all_labels =  for x, labels in train_loader: outputs = model.forward(x, True) outputs = torch.squeeze(outputs) # network has single output, so squeeze to match labels all_outputs.append(outputs.detach()) all_labels.append(labels) all_outputs = torch.cat(all_outputs) all_labels = torch.cat(all_labels) mse = torchmetrics.functional.mean_squared_error(all_outputs, all_labels).item()
I have 2 workers running, so currently when I do this I end up with two mse values in the report. Is there anyway to do this so I can get a single value for the entire dataset rather than two values?