What is the elegant way to keep Ray serve alive?

After creating endpoints, what is the elegant way to keep Ray serve alive so that clients can request at any time?

For now, I use “while True” to keep scirpt from quiting.

Hi @matrixyy, the recommended way is to create a long-lived Ray instance in the background and deploy Serve to it:

# Start Ray and Serve in the background.
ray start --head
serve start --http-host= # If you want to expose Serve on the network.

With this mode of deployment, Serve will keep running in the background until you explicitly call ray stop.

Hi @eoakes The Ray version is 1.2. Are you sure your answer works for 1.2? The following code works (my previous method):

import ray
from ray import serve
import requests

client = serve.start()

def say_hello(request):
    return "hello " + request.query_params["name"] + "!"
# Form a backend from our function and connect it to an endpoint.
client.create_backend("my_backend", say_hello)
client.create_endpoint("my_endpoint", backend="my_backend", route="/hello")

count = 1
while True:
    count += 1
    if (count % 1000 == 0):
        print('loop 1000, sleep 1s')

However, when I use your method (remove while loop, and execute the two command first, the command can execute successfully), there is error when sending http request:
Path /hello not found. Please ping http://.../-/routes for routing table

For version 1.2, I can’t directly use


What only I can use is:

client = serve.start()

It seems in your method, there is no need to call serve.start() in the code.

Hi @matrixyy,

Assuming you are on Ray 1.2.0, after running @eoakes 's commands to start a long-running Ray cluster and Serve instance, you will need to run the following code in Python:

ray.init(address="auto") # Connect to the running Ray cluster.
client = serve.connect() # Connect to the running Serve instance.

Then you can use the client as usual in your code.

In general you’ll want to make sure you’re using the right version of the documentation. Here’s the documentation for the nightly build: Ray Serve: Scalable and Programmable Serving — Ray v2.0.0.dev0

And here’s the documentation for the latest pip release (1.2.0): Ray Serve: Scalable and Programmable Serving — Ray v1.2.0


Hi @architkulkarni It works! (use connect, not start) Thanks! But there is another question. What is the correct way to include a file? My service API is simple: just load a cvs and return the IP based on the query parameter:

def service_return(request):
    df_csv = pd.read_csv('../../data/final_result.csv')
    ## ouput = some opertions on df_csv
    return output

When I first use your method, there is an error in the log:


But when I re-start all service and command again. There is no error and it can return the result successfully. Very weird!

serve.start(detach=True) and serve start , any different ? @eoakes

No difference, you can use either one depending on whether you prefer the command-line interface or Python. The command serve start actually just calls serve.start(detached=True) in Python.

hi @architkulkarni, I’m playing with the nightly built version of ray. I got an error, when I tried ‘serve start’ from the command line:

'RuntimeError: serve.start(detached=True) should not be called in anonymous Ray namespaces because you won’t be able to reconnect to the Serve instance after the script exits. If you want to start a long-lived Serve instance, provide a namespace when connecting to Ray. See the documentation for more details: Using Namespaces — Ray v2.0.0.dev0

But it seems no place for me to pass a namespace from this interface.

Any thought?

My ray version is: ray, version 2.0.0.dev0, and my python version is 3.8


In your python script you need to add:

ray.init(address='auto', ignore_reinit_error=True, namepsace='serve') 
serve.start(detached=True) # or False 

this should ensure that you are serve.start will attach to a namespace you created when running ray.init.