I have some very CPU-intensive data-augmentations. When using torchdata
, I would just add these as a map
step.
However, I’m not sure where to add these in the pre-processing pipeline.
From reading, the source code, I see comments like:
# If the window size is infinity, the preprocessor is cached and
# we don't need to re-apply it each time.
which seems to imply that for (non-streaming modes), the preprocessor will only be applied once.
This would not be useful, if I wanted to re-augment the data every time.
Is the solution just to use streaming mode?