Can Ray provide more information about warning like this? For example, this message
2021-04-29 01:07:23,424 WARNING worker.py:1115 – A worker died or was killed while executing task ffffffffffffffffbac51dc1a122c7929c8127ec01000000
didn’t contain information about what is the task. Perhaps ray can provide the function this task is about so that one can locate the problem.
rliaw
April 28, 2021, 6:16pm
2
Could you perhaps post the output of:
tail /tmp/ray/session_latest/logs/*
?
Yes of course. I restart the whole trial.
Here is the output. It’s too large so I have to post it on the gist.
https://gist.github.com/Seraphli/02021a2717adc3c60d04615df3b42fbb#file-ray_log-txt
I notice a memory issue before the worker died. The worker process occupies a lot of RES memory, about 150G and then it died. And I also notice this process’s state shows as ‘D’ in the htop.