Hi @skabbit , thanks for your interest and moving the discussion onto Discourse!
Have you tried removing the local:// portion from the path? I was able to read a dummy parquet file that I have on my local disk after connecting to a Ray cluster that I started locally:
Well, local:// works well with the local Ray cluster, but it doesn’t work with the remote Ray cluster as I mentioned above.
Here is the description of my case:
I need to run task on remote cluster (this is a requirement);
I need to use large (larger then available memory) parquet file as training data;
This file is located on my local machine, but it must be used on remote node.
If I use file path without local:// ray worker just throw a error: FileNotFoundError: ./file.parquet
And this is expected, because this file doesn’t exists on the cluster, and this is mentioned in docs as well: If the file exists only on the local node and you run this read operation in distributed cluster, this will fail as it cannot access the file from remote node.
This is exactly why it failed: running these 3 lines of code will be using Ray Client (not Head node), and local:// scheme is not supported for Ray Client.
And the suggestion here is using Ray Jobs to submit this script to the cluster that you have. Note if you do this, in your script you may modify the second line to just ray.init() (using the “ray://” will use Ray Client) .
You may check the how to submit this script to your cluster with example here: Quickstart Using the Ray Jobs CLI — Ray 2.2.0