Deploying Multiple Ray Serve Microservices on a Single Cluster with Separate Ports

I am building a microservices-based architecture using Ray Serve, where each microservice has its own deployment YAML and a predefined port configuration.

However, I am facing the following challenges:
When using serve deploy or serve run, each microservice appears to start its own Ray cluster, instead of deploying onto a shared cluster.
Deploying multiple microservices on the same port results in the previous deployment being overwritten.
Even when different ports are specified in individual YAML files, the services are not coexisting as expected on the same cluster.
My goal is to:
Deploy multiple Ray Serve applications (microservices) on a single shared Ray cluster
Expose each microservice on a different port
Manage all services centrally without them deleting or replacing each other
Would KubeRay be the correct approach to achieve this?
Specifically:
Can KubeRay manage a single Ray cluster and deploy multiple Ray Serve applications into it?
Is it possible to expose each Ray Serve application on a different port using Kubernetes Services or Ingress?
What is the recommended architecture or best practice for deploying multiple Ray Serve microservices in a single Kubernetes cluster?
Any guidance, architectural patterns, or references would be greatly appreciated.

1. Severity of the issue: (select one)
High: Completely blocks me.

2. Environment:

  • Ray version: 2.52.1
  • Python version:3.11
  • OS: linux (ubuntu 24.0 LTS)

You cannot expose each Ray Serve application on a different port within a single Ray cluster; Ray Serve only supports one HTTP server (one port) per cluster. The recommended approach is to deploy multiple Ray Serve applications (microservices) on a single Ray cluster using the multi-application API, assigning each application a unique route_prefix (e.g., /service1, /service2). KubeRay can manage a single Ray cluster and deploy multiple Serve applications into it, but all will share the same HTTP(S) port—differentiation is by route, not port. To expose each app on a different external port, use a Kubernetes Ingress or reverse proxy (like NGINX) to map external ports or paths to the appropriate route_prefixes within the shared Ray Serve HTTP endpoint.

For best practices: define all applications in a single serveConfigV2 YAML under the applications field, each with a unique name and route_prefix. Use KubeRay’s RayService CRD to manage the cluster and Serve deployments. For port-based routing, configure your Ingress or Service to direct traffic to the shared Ray Serve port and route_prefixes. See Ray Serve on Kubernetes production guide, multi-app services, and community discussion for details.

Would you like a step-by-step example YAML or Ingress configuration?

Sources:

Hint: Mention @RunLLM in the post for followups.