_local_testing_mode in serve.run

christina · February 12, 2025, 1:29am

Hi there! Welcome to the Ray community~

You’re right, when _local_testing_mode=True, Ray Serve doesn’t actually start an HTTP server. Instead, it runs deployments within a single process using background threads, which is useful for unit testing but not for testing HTTP endpoints. That’s why you’re getting a “Connection refused” error when trying to send requests to 127.0.0.1:8000.

If your goal is to test the logic of your deployment without starting the full Ray infrastructure, you can call the deployment handle directly in Python. For example:

handle: DeploymentHandle = serve.run(app)
response: DeploymentResponse = handle.say_hello_twice.remote(name="Ray")
print(response.result())

This way, you can verify the deployment’s behavior without relying on an HTTP request. But if you actually want to test the HTTP endpoint, you’ll need to run serve.run(app) without _local_testing_mode.

Since this mode is still experimental, if you find it limiting for your use case, you might want to open an issue or feature request on Ray’s GitHub.

Here are some relevant docs in case you’d like to do more reading:

Docs

Topic		Replies	Views
Expose deployments Ray Serve	3	788	August 28, 2023
Official Ray FastAPI tutorial - how to craft a request? Ray Serve	7	1807	June 30, 2021
How to Send Request to Ray Serve if the Server Terminates Right after Starting?	3	423	June 24, 2024
Ray Serve HTTP requests handling	6	1080	April 6, 2023
Ray serve example does not work Ray Serve	5	2247	February 11, 2021

_local_testing_mode in serve.run

Related topics