How to set pipeline windows for Torch Trainer?

How can I set the window size while using TorchTrainer?

Right now, I am getting this warning but I am unsure where I might apply appropriate changes:

(TorchTrainer pid=15236)e[0m ⚠️  This pipeline's windows are ~11.23GiB in size each and may not fit in object store memory without spilling. To improve performance, consider reducing the size of each window to 7.6GiB or less.

Can you share what your script looks like? This should be configurable when constructing the Ray Dataset.

You definitely can window a Ray dataset, but, when you want to insert it into a TorchTrainer like so,
it will throw an exception and refuse to proceed saying it will only accept RayDataset, not a Pipeline.

...
trainer = TorchTrainer(
    train_loop_per_worker=train_loop_per_worker,
     datasets={"train": train_set},
...

@localh If you referring to DatasetPipeline in Ray Data, that mechanism is deprecated. Ray Data now uses streaming execution model under the hood. One way to think about how to supply your Ray data to Ray Train is discussed this in Ray 2.6.1 docs. Check it out.. Also the advanced version discusses how you could split the data set.

cc: @matthewdeng can correct if I’m off track here.

If you referring to DatasetPipeline in Ray Data, that mechanism is deprecated

That would be why!