Medium: Significantly affects my productivity but can find a workaround.
2. Environment:
- Ray version: 2.41.0
- Python version: 3.11
- OS: AL2
- Cloud/Infrastructure: AWS EKS 1.30
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected:
ReadParquet->SplitBlocks(2)
should not show errors when the whole Ray job is successful - Actual:
ReadParquet->SplitBlocks(2)
shows errors even though the output is correct and expected and Ray job was successful
Hi all,
We’ve been using Ray for sometime, but only recently started dipping our toes in ray data. I ran a very simple Ray application to test ray.data but when I inspect the logs in our Ray cluster, I’m seeing a failure here with ReadParquet->SplitBlocks(2)
, albeit the overall Ray job is shown as successful:
This job was executed on an existing Ray cluster with one available ray worker with sufficient memory, with ray autoscaler enabled.
I then scaled up a few more additional workers first before running the job, and there weren’t anymore failures. Oddly, I printed out the row counts of the ray dataset and file sizes for in both cases (the job that had the failed ReadParquet->SplitBlocks(2)
, and without), and they both tally up.
So my question is, is this error something expected, and should not need to worry about since the overall job succeeded? Even if so, what’s the cause of ReadParquet->SplitBlocks(2)
error?