How to run a RayService with container Runtime Environment on RayCluster

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi everybody, I am deploying a GKE cluster and KubeRay Operator installed by Helm.
I also successfully deployed the RayService by using this example:

But at the moment, I am trying to deploy the RayService with a container runtime environment. The docker image was hosted on Artifact Registry. Currently, my Dockerfile looks like this:

FROM python:3.8.16-slim-bullseye

WORKDIR /workspace
COPY . .

RUN pip install -r requirements.txt --no-cache-dir

ENTRYPOINT ["/workspace/entrypoint.sh"]

The entrypoint.sh like this:

#!/bin/bash
OMP_NUM_THREADS=8 serve run config.yaml --port 10080 --host 0.0.0.0

Then, I update the RayService like this:

# Make sure to increase resource requests and limits before using this example in production.
# For examples with more realistic resource configuration, see
# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
  name: hs
  namespace: ray-serving
spec:
  serviceUnhealthySecondThreshold: 300 # Config for the health check threshold for service. Default value is 60.
  deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for deployments. Default value is 60.
  serveConfig:
    importPath: servefastapi:api
    runtimeEnv: |
      container:
        image: asia-northeast1-docker.pkg.dev/project_id/hs:v0.3.1-rc6
    deployments:
      - name: FastAPIDeployment
        numReplicas: 1
        routePrefix: /
  rayClusterConfig:
    rayVersion: '2.4.0' # should match the Ray version in the image of the containers
    ######################headGroupSpecs#################################
    # Ray head pod template.
    headGroupSpec:
      serviceType: ClusterIP # optional
      # the following params are used to complete the ray start: ray start --head --block --redis-port=6379 ...
      enableIngress: true
      rayStartParams:
        port: '6379' # should match container port named gcs-server
        dashboard-host: '0.0.0.0'
        num-cpus: '2' # can be auto-completed from the limits
        block: 'true'
      #pod template
      template:
        spec:
          serviceAccountName: test-sa
          containers:
            - name: ray-head
              image: rayproject/ray:2.4.0
              resources:
                limits:
                  cpu: 8
                  memory: 32Gi
                  ephemeral-storage: 64Gi
                requests:
                  cpu: 4
                  memory: 16Gi
                  ephemeral-storage: 32Gi
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265 # Ray dashboard
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 10080
                  name: serve
    workerGroupSpecs:
      # the pod replicas in this group typed worker
      - replicas: 1
        minReplicas: 1
        maxReplicas: 5
        # logical group name, for this called small-group, also can be functional
        groupName: small-group
        rayStartParams:
          block: 'true'
        #pod template
        template:
          spec:
            serviceAccountName: test-sa
            initContainers:
              # the env var $FQ_RAY_IP is set by the operator if missing, with the value of the head service name
              - name: init
                image: busybox:1.28
                command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for K8s Service $RAY_IP; sleep 2; done"]
            containers:
              - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
                image: rayproject/ray:2.4.0
                lifecycle:
                  preStop:
                    exec:
                      command: ["/bin/sh","-c","ray stop"]
                resources:
                  limits:
                    cpu: "8"
                    memory: "32Gi"
                    ephemeral-storage: 64Gi
                  requests:
                    cpu: "4"
                    memory: "16Gi"
                    ephemeral-storage: 32Gi
    headServiceAnnotations: {}
      # annotations passed on for the Head Service
      # service_key: "service_value"

Logs of the kuberay-operator are below, and the serve’s service can not be spawned.

2023-06-06T16:04:06.209Z        INFO    controllers.RayService  Check serve health      {"ServiceName": "ray-serving/hs", "isHealthy": true, "isReady": false, "isActive": false}
2023-06-06T16:04:06.223Z        INFO    controllers.RayService  Mark cluster as waiting for Serve deployments   {"ServiceName": "ray-serving/hs", "rayCluster": {"apiVersion": "ray.io/v1alpha1", "kind": "RayCluster", "namespace": "ray-serving", "name": "hs-raycluster-pzn8r"}}
2023-06-06T16:04:06.223Z        INFO    controllers.RayService  Cluster is healthy but not ready: checking again in 2s  {"ServiceName": "ray-serving/hs"}

I spent time searching about building a Docker image that can be served by Ray cluster.

Can you help me with a guideline to do it?

Thanks so much,

Not an answer but previous discussion in the past was not resolved.

Serving using container is much desired in real world. Hope you receive useful guide soon.