Parallelize TorchTrainer + Preprocessor + Training?

Is there an example available for using TorchTrainer + Preprocessor in parallel? In my setup, ray is preprocessing everything first, and then training. Is there an example that shows how to make it train + preprocess batches on-the-fly?

@localh Ray Data is now doing streaming execution by default. (Try upgrading to 2.7 if you’re on an older version of Ray.)

This means that batch fetching from s3 + preprocessing + feeding into training happens in a streaming fashion.

The only way to do “preprocessing everything first” is to call ds.materialize at some point before your training starts. See here for more info: Ray Data Internals — Ray 2.7.1