Ray Serve only allows one HTTP server per Ray cluster. When you call serve.start() a second time with a different port, it does not create a new HTTP server—it simply connects to the existing Ray Serve instance, which is already using the first HTTP port.
So there’s a few different ways you can get around resolving this.
Option 1: Run Each Deployment on a Separate Ray Cluster
Since HTTP configuration is cluster-scoped, you need to run each application in a separate Ray cluster to have different HTTP ports. Example:
# Start first Ray cluster
ray start --head --port=6379
# Deploy first application
serve.start(http_options={"host": "0.0.0.0", "port": os.environ.get("HTTP_PORT")})
# Start second Ray cluster on a different port
ray start --head --port=6380
# Deploy second application
serve.start(http_options={"host": "0.0.0.0", "port": os.environ.get("HTTP_PORT_2")})
Each deployment runs independently on its own Ray cluster, allowing different ports.
Option 2: Use Ray Serve Multi-Application Support
If running multiple clusters is not feasible, you can deploy multiple applications on the same Serve instance using Serve Deployments.
Define multiple apps in a Serve config YAML
Deploy it (you can read our docs to see how to do this specifically, I will link it below)
Instead of different ports, each application gets a different route name (e.g., /app1 and /app2).
Option 3: Reverse Proxy
If you must use the same Ray cluster, but different external ports, you can use a reverse proxy like NGINX to map requests to different Serve applications.
I can try to explain what the different functions do.
ray.init(): This is basically letting your script know it needs to connect to an existing Ray cluster. If you don’t provide specific details, it’ll try to start a local Ray cluster. This is necessary before you can use any Ray functionalities, including Ray Serve.
serve.start(): This kicks off Ray Serve in your cluster. It reads your HTTP options (but in your case, since it’s multi-app mode, it cares more about route prefixes). You only need to call this once per cluster session.
serve.run(): This is where you actually set your deployments live, using any configurations you’ve set up, like your app names and routes. If you set blocking=True , the function will block the terminal, which is useful for development and debugging as it streams logs to the console. However, for running multiple applications or scripts, you might want to run it in a non-blocking mode or in the background.
There’s a few deployment workflows too.
Single Application: If you are running a single application, you can use serve.run() with blocking=True to keep the terminal open for logs and debugging.
Multiple Applications: Since you have multiple scripts (file1.py and file2.py ), you should consider running serve.run() in a non-blocking mode or in the background. This can be done by using & in the terminal to run the command in the background or by setting blocking=False if you are using a script. (By running the scripts in the background, you can manage multiple applications more effectively.) Ensure that each application has a unique route_prefix to avoid conflicts.
Essentially, you can try starting each Python script using a non-blocking approach if they need to run concurrently. If you don’t want to use blocking=True, you could devise a way to keep the process running after deployment without blocking the terminal, with proper process management.