Deploy Ray Serve on K8s Cluster

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi there!

I’m trying to deploy a transformer model using Ray Serve on my Kubernetes cluster, but I’m failing to follow the documentation, so I hope someone can help me over here.

  • I’m trying to deploy it on my local docker-desktop K8s cluster
  • I have deployed the KubeRay operator
  • I have deployed the Ray Cluster

I followed the getting started guide to create the code. I omitted some code for brevity:

import numpy as np
import torch

from ray import serve
from starlette.requests import Request
from typing import Dict

class Predictor:
    def __init__(self):
        device = "cuda" if torch.cuda.is_available() else "cpu"
        print("device: ", device)
        self.model = torch.load(
            "sbert_v0_epoch2_totaln40000.torch", map_location=device

    async def __call__(self, starlette_request: Request) -> Dict:
        request = await starlette_request.json()
        text = request["text"]
        return text

predictor = Predictor.bind()

I have the following config file, created with serve build create_vectors:predictor -o predictor_config.yaml:

# This file was generated using the `serve build` command on Ray v2.1.0.

import_path: create_vectors:predictor

runtime_env: {}


port: 8000


- name: Predictor

My Ray Cluster is defined as follows:

kind: RayCluster
  name: raycluster-complete
  rayVersion: "2.0.0"
  enableInTreeAutoscaling: true
    serviceType: LoadBalancer # Options are ClusterIP, NodePort, and LoadBalancer
    enableIngress: false # Optional
      block: "true"
      dashboard-host: ""
    template: # Pod template
        metadata: # Pod metadata
        spec: # Pod spec
            - name: ray-head
              image: rayproject/ray-ml
              # Keep this preStop hook in each Ray container config.
                    command: ["/bin/sh","-c","ray stop"]
              ports: # Optional service port overrides
              - containerPort: 6379
                name: gcs
              - containerPort: 8265
                name: dashboard
              - containerPort: 10001
                name: client
              - containerPort: 8000
                name: serve
  - groupName: ray-worker
    replicas: 0
    minReplicas: 0
    maxReplicas: 5
        block: "true"
        num-gpus: "1"
    template: # Pod template
        # Keep this initContainer in each workerGroup template.
        - name: init-myservice
          image: busybox:1.28
          command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/; do echo waiting for myservice; sleep 2; done"]
          - name: ray-node
            image: rayproject/ray-ml:2.0.0-gpu
            # Keep this preStop hook in each Ray container config.
                  command: ["/bin/sh","-c","ray stop"]

I think I should somehow incorporate my config file into the Ray Cluster definition, but I’m not sure how and whether it’s correct. How do I go continue from here?


Hi @Kasper_Kooijman , thank you for asking questions!

For k8s deployment guide: Deploying on Kubernetes — Ray 2.1.0

For your question, yes, you need to merge two configs together. (Deploying on Kubernetes — Ray 2.1.0 check the tips) but it will look like kuberay/ray_v1alpha1_rayservice.yaml at release-0.3 · ray-project/kuberay · GitHub

Hope this can help you!