I am trying to use the streaming mode, which in previous documentation (that doesnt seem to be available anymore?) recommended we could use for data too big for memory.
My datasetconfig for my torch trainer looks like:
dataset_config={"train": DatasetConfig(fit=True, # fit() on train only; transform all
split=True, # split data acrosss workers if num_workers > 1
global_shuffle=False, # local shuffle
max_object_store_memory_fraction=0.25 # stream mode; % of available object store memory
),
},
Yet with any value that is not default -1 (store data in memory / spill if have to), results in a GPU worker dropping with this error:
Warning: reader on shard 3 of the pipeline has been blocked more than 10.0s waiting for other readers to catch up. All pipeline shards must be read from concurrently
My training runs, its just that 1 GPU worker sits idle.