Ray worker failed with IOError: Broken pipe" error

Hi Guys… :blush:

Recently I started learning CISSP training in hyderabad So I’m running a distributed job using Ray on a small cluster, and I’m getting this recurring error:

RayletClient::PushNormalTask, IOError: Broken pipe

It usually shows up after some tasks have been running fine for a while. I suspect it may be related to network instability or resource limits, but I’m not entirely sure.
I am using Ray version 2.9.0 on Ubuntu, and the tasks are not memory-intensive.

Has anyone else run into this error or know what typically causes it? Would really appreciate any suggestions or debugging tips.

Thank you so much for your help.

Hi @lipefik316 this could be caused by some network instability issues. We have fixed many of those in the recent releases, so I’d suggest trying to upgrade Ray and see if the issue is resolved.