Apologies for the noob question as I am just getting started with ray. This is the script I am using to experiment with ray serve.
#!/usr/bin/env python3
# encoding: utf-8
"""Vanilla Ray Serve Testing"""
import logging
from time import sleep
from typing import Dict
import ray
import requests
from ray import serve
from starlette.requests import Request
logging.basicConfig(format='%(asctime)s|%(levelname)s: %(message)s',
datefmt='%H:%M:%S, %d-%b-%Y', level=logging.INFO)
ray.init() # Works fine without this line
@serve.deployment
class MyModelDeployment:
"""The server class."""
def __init__(self, msg: str):
"""Initialize model state: could be very large neural net weights."""
self._msg = msg
def __call__(self, request: Request) -> Dict:
"""Implement the functional interface."""
logging.info(msg='Got a call.')
return {"result": self._msg}
app = MyModelDeployment.bind(msg="Hello world!")
serve.run(app, route_prefix="/")
print('Server deployed successfully.')
print(requests.get("http://localhost:8000/").json())
It is working without error, as the last line prints the output of the curl request. But the script terminates after execution (and the server stops running), is that supposed to happen?
When I do backend development with flask or fastapi, my experience is, the service stays alive (unless killed by appropriate signal), and listens to the configured port for request. In fact, that is my understanding of any service, that keeps running at a specific port, unless killed. Is not that how ray serve supposed to work, if a client is to use the web interface?
Clearly, I have some conceptual gap in understanding the purpose and usecase of ray serve here, so any guidance or reference will be appreciated.
Also, it seems the line ray.init()
is really superfluous, as the script works exact the same way without the line. So why does that line matter?
P. S. If it matters, I am running the script on my local laptop (Ubuntu 22.04) without any cluster.