I have successfully started a cluster using cluster.yaml file. After sometime, the docker container in the head node has been stopped but the container is still present in all the worker nodes.
I have found the following information from the dashboard.log
2023-10-30 11:30:39,452 WARNING worker.py:2006 – The autoscaler failed with the following error:
Terminated with signal 15
File “/usr/local/lib/python3.8/dist-packages/ray/autoscaler/_private/monitor.py”, line 720, in
monitor.run()
File “/usr/local/lib/python3.8/dist-packages/ray/autoscaler/_private/monitor.py”, line 595, in run
self._run()
File “/usr/local/lib/python3.8/dist-packages/ray/autoscaler/_private/monitor.py”, line 449, in _run
time.sleep(AUTOSCALER_UPDATE_INTERVAL_S)
2023-10-30 11:30:39,500 WARNING dashboard.py:236 – Exiting with SIGTERM immediately…
Can anyone please help me to figure out what could be the issue?
I have checked the dashboard_agent.log file. Here is the information from it
2023-10-30 11:33:51,280 ERROR optional_utils.py:281 – Unexpected error in handler: No module named ‘fastapi’. You can run pip install "ray[serve]"
to install all Ray Serve dependencies.
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/dashboard/optional_utils.py”, line 279, in decorator
return await f(self, *args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/serve/serve_agent.py”, line 89, in get_serve_instance_details
from ray.serve.schema import ServeInstanceDetails
File “/usr/local/lib/python3.8/dist-packages/ray/serve/init.py”, line 29, in
raise e
File “/usr/local/lib/python3.8/dist-packages/ray/serve/init.py”, line 4, in
from ray.serve.api import (
File “/usr/local/lib/python3.8/dist-packages/ray/serve/api.py”, line 7, in
from fastapi import APIRouter, FastAPI
ModuleNotFoundError: No module named ‘fastapi’. You can run pip install "ray[serve]"
to install all Ray Serve dependencies.
2023-10-30 11:33:51,281 INFO web_log.py:206 – 10.40.40.50 [30/Oct/2023:06:03:51 +0000] “GET /api/serve/applications/ HTTP/1.1” 500 1019 “-” “Python/3.8 aiohttp/3.8.6”