Problem with anything on Ray

Hey!

At first, thanks for you time & efforts!
I have been struggling to make use of ray, and ray.data in particular.

I am using Ray via Domino datalab, where I can spin a cluster (here: 1x Head + 3x Workers “medium tier compute” 4cores 15gb RAM). Ray==1.9.2 .

Try to load 2.8GB parquet file into ray dataset and however I will play with it will crush and show different error. Typically if I wont connect to cluster I can run it in local mode. But once connected to cluster, Ray forces to use only one worker regardless if I use parameter “parallelism” in read_parquet. File is small enough to load it on any of these machines, but for some reason it will run out of RAM and kill worker.
I am really confused and not sure what went wrong.

Any suggestions ?

Hi @magic-dlg, I think that you’re running into an old issue with Datasets around load balancing read tasks that was fixed a few months ago. Could you try using a Ray nightly wheel to confirm that this is the underlying issue?

1 Like

Hey Clark,

Many thanks for suggestion. It helped and now it works.
Sorry for long delay, but holiday break + in corporate env things sometime can take very long.

Tested some vanilla code form ray page and Dataset and Train works so far. Yet still have some problem so run the default examples of Tune. I think i need to look for it separately.
thanks!

1 Like