We are running Ray with k8s. We can’t access the logs files on the pod because it fails so we get
error: cannot exec into a container in a completed pod; current phase is Failed
We do see the logs to stdout using
kubectl logs though, here’s the full traceback
Traceback (most recent call last):
File "main.py", line 193, in <module>
results = ray.get(futures)
File "/usr/local/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 46, in wrapper
return getattr(ray, func.__name__)(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/ray/util/client/api.py", line 35, in get
return self.worker.get(vals, timeout=timeout)
File "/usr/local/lib/python3.7/site-packages/ray/util/client/worker.py", line 196, in get
res = self._get(obj_ref, op_timeout)
File "/usr/local/lib/python3.7/site-packages/ray/util/client/worker.py", line 219, in _get
ray.exceptions.WorkerCrashedError: The worker died unexpectedly while executing this task. Check python-core-worker-*.log files for more information.
A worker died or was killed while executing task c4280fb54ad6b274ffffffffffffffffffffffff05000000.
Any ideas how to troubleshoot?