How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Dear Ray community,
I have this error
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff809f99822304543a1e3cced901000000 Worker ID: d653a51c0223abe8aa902ab8067201e7f2fcbc8c6b89fcbe93150737 Node ID: 16d6c7e905ea8267d00a2779373ed4e0a2e17bd874f8b0b801c93033 Worker IP address: 127.0.0.1 Worker port: 62559 Worker PID: 23061 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(TorchTrainer pid=23052) Worker 1 has failed.
(RayTrainWorker pid=23062) [rank1]: Traceback (most recent call last):
(RayTrainWorker pid=23062) [rank1]: File "python/ray/_raylet.pyx", line 2251, in ray._raylet.task_execution_handler
(RayTrainWorker pid=23062) [rank1]: File "python/ray/_raylet.pyx", line 2082, in ray._raylet.execute_task_with_cancellation_handler
(RayTrainWorker pid=23062) [rank1]: AttributeError: 'Worker' object has no attribute 'core_worker'
(RayTrainWorker pid=23062)
(RayTrainWorker pid=23062) [rank1]: During handling of the above exception, another exception occurred:
(RayTrainWorker pid=23062)
(RayTrainWorker pid=23062) [rank1]: Traceback (most recent call last):
(RayTrainWorker pid=23062) [rank1]: File "python/ray/_raylet.pyx", line 2290, in ray._raylet.task_execution_handler
(RayTrainWorker pid=23062) [rank1]: File "/Users/mdenadai/.local/share/virtualenvs/mine-ec3snymA/lib/python3.12/site-packages/ray/_private/utils.py", line 178, in push_error_to_driver
(RayTrainWorker pid=23062) [rank1]: worker.core_worker.push_error(job_id, error_type, message, time.time())
(RayTrainWorker pid=23062) [rank1]: ^^^^^^^^^^^^^^^^^^
(RayTrainWorker pid=23062) [rank1]: AttributeError: 'Worker' object has no attribute 'core_worker'
(RayTrainWorker pid=23062) Exception ignored in: 'ray._raylet.task_execution_handler'
(RayTrainWorker pid=23062) Traceback (most recent call last):
(RayTrainWorker pid=23062) File "python/ray/_raylet.pyx", line 2290, in ray._raylet.task_execution_handler
(RayTrainWorker pid=23062) File "/Users/mdenadai/.local/share/virtualenvs/mine-ec3snymA/lib/python3.12/site-packages/ray/_private/utils.py", line 178, in push_error_to_driver
(RayTrainWorker pid=23062) worker.core_worker.push_error(job_id, error_type, message, time.time())
(RayTrainWorker pid=23062) ^^^^^^^^^^^^^^^^^^
(RayTrainWorker pid=23062) AttributeError: 'Worker' object has no attribute 'core_worker'
(RayTrainWorker pid=23062) [2024-09-30 14:32:06,779 C 23062 2807883] task_receiver.cc:213: Check failed: objects_valid
which means that a Worker does not have a core_worker set. When does it happen? I see that core_worker
is set by a connect
function here ray/python/ray/_private/worker.py at 073d143c62e24f931812c6f27243974506a7049c · ray-project/ray · GitHub but why do I have an error? It means that we do not use this function somewhere, right?
Thx