[Data][ray2.2.0] Out of Memory when using ray.data.from_torch

I tried to use ray.data.from_torch to convert my torch datasets. For small datasets it works well. But for bigger … Out of Memory. I had the hope that maybe the warning that “from_torch doesn’t support parallel reads” only means that it would take a while but it seems to be wrong.

(raylet) [2023-02-08 17:10:57,837 E 687650 687650] (raylet) node_manager.cc:3097: 32 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: f11dc02c17f5e48d2a7440f12189cb6f7c996708ecce7948cd54f585, IP: 172.16.7.74) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.16.7.74`
(raylet) 
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(raylet) [2023-02-08 17:11:58,633 E 687650 687650] (raylet) node_manager.cc:3097: 5 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: f11dc02c17f5e48d2a7440f12189cb6f7c996708ecce7948cd54f585, IP: 172.16.7.74) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.16.7.74`
(raylet) 
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(raylet) [2023-02-08 17:12:34,952 E 687650 687650] (raylet) local_object_manager.cc:360: Failed to send object spilling request: GrpcUnavailable: RPC Error message: Socket closed; RPC Error details: 
(raylet) [2023-02-08 17:12:52,983 E 687650 687650] (raylet) local_object_manager.cc:360: Failed to send object spilling request: GrpcUnavailable: RPC Error message: Socket closed; RPC Error details: 
(raylet) [2023-02-08 17:12:58,634 E 687650 687650] (raylet) node_manager.cc:3097: 15 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: f11dc02c17f5e48d2a7440f12189cb6f7c996708ecce7948cd54f585, IP: 172.16.7.74) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.16.7.74`
(raylet) 
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(raylet) [2023-02-08 17:13:09,757 E 687650 687650] (raylet) local_object_manager.cc:360: Failed to send object spilling request: GrpcUnavailable: RPC Error message: Socket closed; RPC Error details: 
(raylet) [2023-02-08 17:13:11,009 E 687650 687650] (raylet) local_object_manager.cc:360: Failed to send object spilling request: GrpcUnavailable: RPC Error message: Socket closed; RPC Error details: 
(raylet) [2023-02-08 17:13:27,038 E 687650 687650] (raylet) local_object_manager.cc:360: Failed to send object spilling request: GrpcUnavailable: RPC Error message: Socket closed; RPC Error details: 
(raylet) [2023-02-08 17:13:44,302 E 687650 687650] (raylet) local_object_manager.cc:360: Failed to send object spilling request: GrpcUnavailable: RPC Error message: Socket closed; RPC Error details: 
(raylet) [2023-02-08 17:13:59,636 E 687650 687650] (raylet) node_manager.cc:3097: 7 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: f11dc02c17f5e48d2a7440f12189cb6f7c996708ecce7948cd54f585, IP: 172.16.7.74) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.16.7.74`
(raylet) 

Are there any alternatives to convert torch datasets? I already tried to iterate over my torch dataset and then use ray.data.from_items and union. but that seems to lead to problems when shuffling or splitting the ray dataset.