Error after queue initialisation

shyampatel · March 14, 2023, 4:59am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I got following error after initialisation of queue:

  File "/tmp/ray/session_2023-03-11_20-32-22_575180_88/runtime_resources/working_dir_files/_ray_pkg_b7b314c25cc80b56/tools/detection_actor.py", line 82, in start_job
    await self.det_holders[job_id].put_async(DetectionObj(dets, job_id))
  File "/usr/local/lib/python3.8/dist-packages/ray/util/queue.py", line 132, in put_async
    await self.actor.put.remote(item, timeout)
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
	class_name: _QueueActor
	actor_id: 7ebe8644ea290e8661bbe7fd07000000
	pid: 1995
	name: 440_dh
	namespace: raypipe
	ip: 172.16.30.130
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
The actor never ran - it was cancelled before it started running.

Here, first I am initialising queue named “det_holder” and pass it to two different actors, first detection actor adds detection object in that, another tracking actor read that object from queue.
After queue initialisation, for first frame only while detection actor tried to put object, it threw mentioned error. Which means, ray couldn’t initialise it properly.

I could not regenerate this issue again. Can anyone suggest, what can be reason/cause behind this?

Chen_Shen · March 14, 2023, 11:49pm

hey @shyampatel unfortunately it’s hard to tell why it crashed without logs or your code snippet. Would it possible to get the logs of the crashed worker? Logging — Ray 2.3.0 has a bit more context where the worker log resides.

shyampatel · March 16, 2023, 6:29am

Actually, after this, we have restarted the cluster. If you can guide to fetch logs of the crashed worker, I can forward it.

Chen_Shen · March 16, 2023, 9:04pm

hi, @shyampatel the log should under /tmp/ray/session_$timestamp on the node where the actor crashed.

shyampatel · April 5, 2023, 10:35am

I could regenerate this issue again. Can you please tell me which specific log file do you require?

Topic		Replies	Views
[Core] Exception in Ray Queue, probably during shutdown Ray Core	2	311	May 15, 2021
Ray Actor Dying unexpectedly Ray Core	8	3942	October 21, 2022
RayActorError: The actor died unexpectedly before finishing this task	2	1518	November 1, 2022
Problems to use multiprocessing.Pool and Queue together Ray Core	1	999	November 15, 2022
Root cause the actor is dead because its node has died	1	847	August 2, 2023

Error after queue initialisation

Related topics