DatasetConfig and streaming breaks a worker

localh · July 27, 2023, 6:27pm

I am trying to use the streaming mode, which in previous documentation (that doesnt seem to be available anymore?) recommended we could use for data too big for memory.

My datasetconfig for my torch trainer looks like:

dataset_config={"train": DatasetConfig(fit=True,  # fit() on train only; transform all
                                        split=True,  # split data acrosss workers if num_workers > 1
                                        global_shuffle=False,  # local shuffle
                                        max_object_store_memory_fraction=0.25  # stream mode; % of available object store memory
                                        ),
                    },

Yet with any value that is not default -1 (store data in memory / spill if have to), results in a GPU worker dropping with this error:

Warning: reader on shard 3 of the pipeline has been blocked more than 10.0s waiting for other readers to catch up. All pipeline shards must be read from concurrently

My training runs, its just that 1 GPU worker sits idle.

Topic		Replies	Views
Ray.data.read_csv Huge Dataset memory limitations	0	236	September 5, 2023
Ray Streaming is timing out while trying to get next window	0	130	November 8, 2023
How to set pipeline windows for Torch Trainer?	4	263	August 10, 2023
Tuning Settings for Big Data Ray Data	0	286	November 3, 2023
Prevent restart of actors in DatasetPipeline	0	209	July 24, 2023

DatasetConfig and streaming breaks a worker

Related topics