_local_testing_mode in serve.run

Hi there! Welcome to the Ray community~

You’re right, when _local_testing_mode=True, Ray Serve doesn’t actually start an HTTP server. Instead, it runs deployments within a single process using background threads, which is useful for unit testing but not for testing HTTP endpoints. That’s why you’re getting a “Connection refused” error when trying to send requests to 127.0.0.1:8000.

If your goal is to test the logic of your deployment without starting the full Ray infrastructure, you can call the deployment handle directly in Python. For example:

handle: DeploymentHandle = serve.run(app)
response: DeploymentResponse = handle.say_hello_twice.remote(name="Ray")
print(response.result())

This way, you can verify the deployment’s behavior without relying on an HTTP request. But if you actually want to test the HTTP endpoint, you’ll need to run serve.run(app) without _local_testing_mode.

Since this mode is still experimental, if you find it limiting for your use case, you might want to open an issue or feature request on Ray’s GitHub. :slight_smile:

Here are some relevant docs in case you’d like to do more reading:

Docs