What's the recommended way to get intermediate model predictions out of Ray Tune?

I use tune for two things:

  1. Tuning models (Validation-set and K-fold approach)
  2. Running N models to measure mean and std performance.

In the 2nd case, I want to see the model predictions from all the models, not just the test metric.

How do I fetch this?
The Trainable framework supports step and save_checkpoint, but neither really fit the use-case. I calculate metrics after each step, and also on the final step

At the moment, I’ve resorted to some weird ideas like compressing the predictions DataFrame into a string and calling that a “metric”, but this honestly seems like a broken approach.

You’re correct, a “metric” should be a small, lightweight summary value of your training – something like training/validation loss.

You can save these as generic trial artifacts by writing to the current working directory in a Tune Trainable.

For example:

from ray import tune

class MyTrainable(tune.Trainable):
    def step(self):
        # do training, then generate predictions
        # predictions = ...

        iteration = self.training_iteration
        with open(f"./predictions_for_iter={iteration}.pt", "w") as f:
            # Dump your pandas DF or save in whatever way you want!
            # torch.save(predictions, f)

tuner = tune.Tuner(MyTrainable)
results = tuner.fit()

for result in results:
    # All your artifacts are saved relative to the trial directory
    print("Trial directory:", result.log_dir) # On ray<=2.4
    print(os.listdir(result.log_dir))  # All of your artifacts should be here
    # print("Trial directory:", result.path) # On ray>2.4 (on nightly as of 6/5)

This user guide section may also be helpful. It goes over how to get data out of tune in the form of an AIR checkpoint: Getting Data in and out of Tune — Ray 2.4.0