Where to apply data augmentations when using trainer?

I have some very CPU-intensive data-augmentations. When using torchdata, I would just add these as a map step.

However, I’m not sure where to add these in the pre-processing pipeline.
From reading, the source code, I see comments like:

                    # If the window size is infinity, the preprocessor is cached and
                    # we don't need to re-apply it each time.

which seems to imply that for (non-streaming modes), the preprocessor will only be applied once.
This would not be useful, if I wanted to re-augment the data every time.

Is the solution just to use streaming mode?

Hey Vedant, thanks for reaching out!

Looks like your question was already answered on Slack. I’m posting the answer in case anyone else discovers this thread.

If you set use_streaming_api to True and specify a finite stream_window_size, then preprocessing operations are applied every epoch.