Hi there! Welcome to the Ray community~
You’re right, when _local_testing_mode=True, Ray Serve doesn’t actually start an HTTP server. Instead, it runs deployments within a single process using background threads, which is useful for unit testing but not for testing HTTP endpoints. That’s why you’re getting a “Connection refused” error when trying to send requests to 127.0.0.1:8000.
If your goal is to test the logic of your deployment without starting the full Ray infrastructure, you can call the deployment handle directly in Python. For example:
handle: DeploymentHandle = serve.run(app)
response: DeploymentResponse = handle.say_hello_twice.remote(name="Ray")
print(response.result())
This way, you can verify the deployment’s behavior without relying on an HTTP request. But if you actually want to test the HTTP endpoint, you’ll need to run serve.run(app) without _local_testing_mode.
Since this mode is still experimental, if you find it limiting for your use case, you might want to open an issue or feature request on Ray’s GitHub. ![]()
Here are some relevant docs in case you’d like to do more reading:
Docs