At first, thanks for you time & efforts!
I have been struggling to make use of ray, and ray.data in particular.
I am using Ray via Domino datalab, where I can spin a cluster (here: 1x Head + 3x Workers “medium tier compute” 4cores 15gb RAM). Ray==1.9.2 .
Try to load 2.8GB parquet file into ray dataset and however I will play with it will crush and show different error. Typically if I wont connect to cluster I can run it in local mode. But once connected to cluster, Ray forces to use only one worker regardless if I use parameter “parallelism” in read_parquet. File is small enough to load it on any of these machines, but for some reason it will run out of RAM and kill worker.
I am really confused and not sure what went wrong.
Any suggestions ?