When using Ray’s Data and Train modules together, if the dataset hasn’t been materialized, it provides a batch execution (using read_*
and map
method) where the batches are sent to training as soon as they are ready. There’s also an option (here just use read_*
method) to use a collate_fn
within iter_torch_batches
. I’m curious whether the collate_fn
used inside that is processed in parallel across the Ray cluster, and how this approach differs from the first one I mentioned. (edited)