How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi everybody, I am deploying a GKE cluster and KubeRay Operator installed by Helm.
I also successfully deployed the RayService by using this example:
But at the moment, I am trying to deploy the RayService with a container runtime environment. The docker image was hosted on Artifact Registry. Currently, my Dockerfile looks like this:
FROM python:3.8.16-slim-bullseye
WORKDIR /workspace
COPY . .
RUN pip install -r requirements.txt --no-cache-dir
ENTRYPOINT ["/workspace/entrypoint.sh"]
The entrypoint.sh like this:
#!/bin/bash
OMP_NUM_THREADS=8 serve run config.yaml --port 10080 --host 0.0.0.0
Then, I update the RayService like this:
# Make sure to increase resource requests and limits before using this example in production.
# For examples with more realistic resource configuration, see
# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
name: hs
namespace: ray-serving
spec:
serviceUnhealthySecondThreshold: 300 # Config for the health check threshold for service. Default value is 60.
deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for deployments. Default value is 60.
serveConfig:
importPath: servefastapi:api
runtimeEnv: |
container:
image: asia-northeast1-docker.pkg.dev/project_id/hs:v0.3.1-rc6
deployments:
- name: FastAPIDeployment
numReplicas: 1
routePrefix: /
rayClusterConfig:
rayVersion: '2.4.0' # should match the Ray version in the image of the containers
######################headGroupSpecs#################################
# Ray head pod template.
headGroupSpec:
serviceType: ClusterIP # optional
# the following params are used to complete the ray start: ray start --head --block --redis-port=6379 ...
enableIngress: true
rayStartParams:
port: '6379' # should match container port named gcs-server
dashboard-host: '0.0.0.0'
num-cpus: '2' # can be auto-completed from the limits
block: 'true'
#pod template
template:
spec:
serviceAccountName: test-sa
containers:
- name: ray-head
image: rayproject/ray:2.4.0
resources:
limits:
cpu: 8
memory: 32Gi
ephemeral-storage: 64Gi
requests:
cpu: 4
memory: 16Gi
ephemeral-storage: 32Gi
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265 # Ray dashboard
name: dashboard
- containerPort: 10001
name: client
- containerPort: 10080
name: serve
workerGroupSpecs:
# the pod replicas in this group typed worker
- replicas: 1
minReplicas: 1
maxReplicas: 5
# logical group name, for this called small-group, also can be functional
groupName: small-group
rayStartParams:
block: 'true'
#pod template
template:
spec:
serviceAccountName: test-sa
initContainers:
# the env var $FQ_RAY_IP is set by the operator if missing, with the value of the head service name
- name: init
image: busybox:1.28
command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for K8s Service $RAY_IP; sleep 2; done"]
containers:
- name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: rayproject/ray:2.4.0
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","ray stop"]
resources:
limits:
cpu: "8"
memory: "32Gi"
ephemeral-storage: 64Gi
requests:
cpu: "4"
memory: "16Gi"
ephemeral-storage: 32Gi
headServiceAnnotations: {}
# annotations passed on for the Head Service
# service_key: "service_value"
Logs of the kuberay-operator are below, and the serve’s service can not be spawned.
2023-06-06T16:04:06.209Z INFO controllers.RayService Check serve health {"ServiceName": "ray-serving/hs", "isHealthy": true, "isReady": false, "isActive": false}
2023-06-06T16:04:06.223Z INFO controllers.RayService Mark cluster as waiting for Serve deployments {"ServiceName": "ray-serving/hs", "rayCluster": {"apiVersion": "ray.io/v1alpha1", "kind": "RayCluster", "namespace": "ray-serving", "name": "hs-raycluster-pzn8r"}}
2023-06-06T16:04:06.223Z INFO controllers.RayService Cluster is healthy but not ready: checking again in 2s {"ServiceName": "ray-serving/hs"}
I spent time searching about building a Docker image that can be served by Ray cluster.
Can you help me with a guideline to do it?
Thanks so much,