How to Send Request to Ray Serve if the Server Terminates Right after Starting?

Apologies for the noob question as I am just getting started with ray. This is the script I am using to experiment with ray serve.

#!/usr/bin/env python3
# encoding: utf-8
"""Vanilla Ray Serve Testing"""

import logging
from time import sleep
from typing import Dict

import ray
import requests
from ray import serve
from starlette.requests import Request

logging.basicConfig(format='%(asctime)s|%(levelname)s: %(message)s',
                    datefmt='%H:%M:%S, %d-%b-%Y', level=logging.INFO)

ray.init() # Works fine without this line


@serve.deployment
class MyModelDeployment:
    """The server class."""

    def __init__(self, msg: str):
        """Initialize model state: could be very large neural net weights."""
        self._msg = msg

    def __call__(self, request: Request) -> Dict:
        """Implement the functional interface."""
        logging.info(msg='Got a call.')
        return {"result": self._msg}


app = MyModelDeployment.bind(msg="Hello world!")
serve.run(app, route_prefix="/")
print('Server deployed successfully.')
print(requests.get("http://localhost:8000/").json())

It is working without error, as the last line prints the output of the curl request. But the script terminates after execution (and the server stops running), is that supposed to happen?

When I do backend development with flask or fastapi, my experience is, the service stays alive (unless killed by appropriate signal), and listens to the configured port for request. In fact, that is my understanding of any service, that keeps running at a specific port, unless killed. Is not that how ray serve supposed to work, if a client is to use the web interface?

Clearly, I have some conceptual gap in understanding the purpose and usecase of ray serve here, so any guidance or reference will be appreciated.

Also, it seems the line ray.init() is really superfluous, as the script works exact the same way without the line. So why does that line matter?

P. S. If it matters, I am running the script on my local laptop (Ubuntu 22.04) without any cluster.

I experience exactly the same problem. Ray’s serve.run starts and dies immediately. I have searched for the answers to no avail.

A hack is to use an endless sleeping loop in the end to keep the service alive.

from time import sleep
... # Put your application code here
while True:
    sleep(2**10) # Just sleep, do not die. This to keep the process alive.

This looks really ugly in my eyes though, but not sure if there is a documented way of achieving this outcome.

If you’re starting the server using the serve.run() Python API, then you would need to add an infinite while loop as @random suggests. To start a long-running server, there are a couple options:

  • During development, you can run serve run in a terminal and leave the command running (example). Then you can submit requests from another terminal window.
  • In production, you can use KubeRay or serve deploy to submit a Serve config file to a Ray cluster (docs).