Is there an example available for using TorchTrainer + Preprocessor in parallel? In my setup, ray is preprocessing everything first, and then training. Is there an example that shows how to make it train + preprocess batches on-the-fly?
@localh Ray Data is now doing streaming execution by default. (Try upgrading to 2.7 if you’re on an older version of Ray.)
This means that batch fetching from s3 + preprocessing + feeding into training happens in a streaming fashion.
The only way to do “preprocessing everything first” is to call ds.materialize
at some point before your training starts. See here for more info: Ray Data Internals — Ray 2.7.1