Ray data `ReadParquet->SplitBlocks(2)` shows failure even though the entire ray job is successful

Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

  • Ray version: 2.41.0
  • Python version: 3.11
  • OS: AL2
  • Cloud/Infrastructure: AWS EKS 1.30
  • Other libs/tools (if relevant):

3. What happened vs. what you expected:

  • Expected: ReadParquet->SplitBlocks(2) should not show errors when the whole Ray job is successful
  • Actual: ReadParquet->SplitBlocks(2) shows errors even though the output is correct and expected and Ray job was successful

Hi all,

We’ve been using Ray for sometime, but only recently started dipping our toes in ray data. I ran a very simple Ray application to test ray.data but when I inspect the logs in our Ray cluster, I’m seeing a failure here with ReadParquet->SplitBlocks(2), albeit the overall Ray job is shown as successful:

This job was executed on an existing Ray cluster with one available ray worker with sufficient memory, with ray autoscaler enabled.

I then scaled up a few more additional workers first before running the job, and there weren’t anymore failures. Oddly, I printed out the row counts of the ray dataset and file sizes for in both cases (the job that had the failed ReadParquet->SplitBlocks(2), and without), and they both tally up.

So my question is, is this error something expected, and should not need to worry about since the overall job succeeded? Even if so, what’s the cause of ReadParquet->SplitBlocks(2) error?

Hello! I believe this is due to an autoscaling error, this is mentioned a bit here. Debugging Ray Data auto-scaling errors | Anyscale Docs

So yes, you don’t need to worry about the errors since the overall job succeeded. The error is from the cluster not having enough resources to satisfy the splitting blocks requirement. If you experience a lot of these errors though, this might mean your cluster needs more resources allocated to them, so you might want to adjust your autoscaling or resource allocation for those jobs.

1 Like