Set different HTTP port for different deployments

I’m using Ray Serve and have two separate deployments.

  • In the first deployment, I start Ray Serve with:

serve.start(http_options={"host": "0.0.0.0", "port": os.environ.get("HTTP_PORT")})

  • In the second deployment, which has a different application name, I use:

serve.start(http_options={"host": "0.0.0.0", "port": os.environ.get("HTTP_PORT_2")})

However, in the second deployment, HTTP_PORT_2 is being ignored.

How can I set a different port for the second deployment?

Hi kuku! Welcome to the Ray community :blush:

Ray Serve only allows one HTTP server per Ray cluster. When you call serve.start() a second time with a different port, it does not create a new HTTP server—it simply connects to the existing Ray Serve instance, which is already using the first HTTP port.

So there’s a few different ways you can get around resolving this.

Option 1: Run Each Deployment on a Separate Ray Cluster

Since HTTP configuration is cluster-scoped, you need to run each application in a separate Ray cluster to have different HTTP ports. Example:

# Start first Ray cluster
ray start --head --port=6379

# Deploy first application
serve.start(http_options={"host": "0.0.0.0", "port": os.environ.get("HTTP_PORT")})

# Start second Ray cluster on a different port
ray start --head --port=6380

# Deploy second application
serve.start(http_options={"host": "0.0.0.0", "port": os.environ.get("HTTP_PORT_2")})

Each deployment runs independently on its own Ray cluster, allowing different ports.

Option 2: Use Ray Serve Multi-Application Support

If running multiple clusters is not feasible, you can deploy multiple applications on the same Serve instance using Serve Deployments.

  1. Define multiple apps in a Serve config YAML
  2. Deploy it (you can read our docs to see how to do this specifically, I will link it below)
    Instead of different ports, each application gets a different route name (e.g., /app1 and /app2).

Option 3: Reverse Proxy

If you must use the same Ray cluster, but different external ports, you can use a reverse proxy like NGINX to map requests to different Serve applications.

Here’s some of the docs:
Docs:

1 Like

Thanks, Christina.

I’m currently using option 2, where I set a different application name and route_prefix in serve.run().

My current setup:

  1. In file1.py, I have:

    • ray.init(address="auto", ....) # I also set some other params like logging.
    • serve.start(...)
    • serve.run(blocking=True)
  2. In file2.py, the structure is similar to file1.py.

  3. In the terminal, I run:

    ray --head --port=$PORT
    python file1.py  # If I set serve.run(blocking=True), should I run this in the background?
    python file2.py
    

I tend to confuse with ray.init(), serve.start and serve.run(), what is the better deployment workflow for my case?

:slight_smile: I can try to explain what the different functions do.

  1. ray.init(): This is basically letting your script know it needs to connect to an existing Ray cluster. If you don’t provide specific details, it’ll try to start a local Ray cluster. This is necessary before you can use any Ray functionalities, including Ray Serve.
  2. serve.start(): This kicks off Ray Serve in your cluster. It reads your HTTP options (but in your case, since it’s multi-app mode, it cares more about route prefixes). You only need to call this once per cluster session.
  3. serve.run(): This is where you actually set your deployments live, using any configurations you’ve set up, like your app names and routes. If you set blocking=True , the function will block the terminal, which is useful for development and debugging as it streams logs to the console. However, for running multiple applications or scripts, you might want to run it in a non-blocking mode or in the background.

There’s a few deployment workflows too.

  • Single Application: If you are running a single application, you can use serve.run() with blocking=True to keep the terminal open for logs and debugging.
  • Multiple Applications: Since you have multiple scripts (file1.py and file2.py ), you should consider running serve.run() in a non-blocking mode or in the background. This can be done by using & in the terminal to run the command in the background or by setting blocking=False if you are using a script. (By running the scripts in the background, you can manage multiple applications more effectively.) Ensure that each application has a unique route_prefix to avoid conflicts.

Essentially, you can try starting each Python script using a non-blocking approach if they need to run concurrently. If you don’t want to use blocking=True, you could devise a way to keep the process running after deployment without blocking the terminal, with proper process management.

2 Likes