How to Send Request to Ray Serve if the Server Terminates Right after Starting?

random · April 11, 2024, 9:22am

Apologies for the noob question as I am just getting started with ray. This is the script I am using to experiment with ray serve.

#!/usr/bin/env python3
# encoding: utf-8
"""Vanilla Ray Serve Testing"""

import logging
from time import sleep
from typing import Dict

import ray
import requests
from ray import serve
from starlette.requests import Request

logging.basicConfig(format='%(asctime)s|%(levelname)s: %(message)s',
                    datefmt='%H:%M:%S, %d-%b-%Y', level=logging.INFO)

ray.init() # Works fine without this line


@serve.deployment
class MyModelDeployment:
    """The server class."""

    def __init__(self, msg: str):
        """Initialize model state: could be very large neural net weights."""
        self._msg = msg

    def __call__(self, request: Request) -> Dict:
        """Implement the functional interface."""
        logging.info(msg='Got a call.')
        return {"result": self._msg}


app = MyModelDeployment.bind(msg="Hello world!")
serve.run(app, route_prefix="/")
print('Server deployed successfully.')
print(requests.get("http://localhost:8000/").json())

It is working without error, as the last line prints the output of the curl request. But the script terminates after execution (and the server stops running), is that supposed to happen?

When I do backend development with flask or fastapi, my experience is, the service stays alive (unless killed by appropriate signal), and listens to the configured port for request. In fact, that is my understanding of any service, that keeps running at a specific port, unless killed. Is not that how ray serve supposed to work, if a client is to use the web interface?

Clearly, I have some conceptual gap in understanding the purpose and usecase of ray serve here, so any guidance or reference will be appreciated.

Also, it seems the line ray.init() is really superfluous, as the script works exact the same way without the line. So why does that line matter?

P. S. If it matters, I am running the script on my local laptop (Ubuntu 22.04) without any cluster.

David · June 23, 2024, 6:26pm

I experience exactly the same problem. Ray’s serve.run starts and dies immediately. I have searched for the answers to no avail.

random · June 24, 2024, 1:34am

A hack is to use an endless sleeping loop in the end to keep the service alive.

from time import sleep
... # Put your application code here
while True:
    sleep(2**10) # Just sleep, do not die. This to keep the process alive.

This looks really ugly in my eyes though, but not sure if there is a documented way of achieving this outcome.

shrekris · June 24, 2024, 5:11pm

If you’re starting the server using the serve.run() Python API, then you would need to add an infinite while loop as @random suggests. To start a long-running server, there are a couple options:

During development, you can run serve run in a terminal and leave the command running (example). Then you can submit requests from another terminal window.
In production, you can use KubeRay or serve deploy to submit a Serve config file to a Ray cluster (docs).

Topic		Replies	Views
What is the elegant way to keep Ray serve alive? Ray Serve	10	2949	May 2, 2023
Ray Serve - Client request Cancellation Ray Serve	2	116	March 27, 2025
Dumb but essential questions thread Ray Serve	7	564	January 19, 2022
Ray Serve with Fast API and Serve batch- Client Request cancellation RLlib	0	60	January 3, 2025
Ray serve example does not work Ray Serve	5	2192	February 11, 2021

How to Send Request to Ray Serve if the Server Terminates Right after Starting?

Related topics