Your question is a bit vague—could you clarify what specific issue or error you’re referring to in Ray? Are you seeing a crash, a hang, a specific exception, or unexpected behavior? If you can provide an error message, log snippet, or describe the symptoms, I can give a more targeted answer.
If you’re experiencing random halts or no error/warning in Ray, possible causes include task retries, resource exhaustion, or silent failures. For example, setting RAY_TASK_MAX_RETRIES=0 has resolved silent halts for some users, and sometimes no error appears in the logs even when everything stops running. Reviewing driver logs and checking resource utilization can help, but more details would allow for a precise diagnosis. See this discussion for a similar issue.
Would you like more detail or have a specific error message to share?
Sources:
Hint: Mention @RunLLM in the post for followups.