How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi there!
I’m trying to deploy a transformer model using Ray Serve on my Kubernetes cluster, but I’m failing to follow the documentation, so I hope someone can help me over here.
- I’m trying to deploy it on my local docker-desktop K8s cluster
- I have deployed the KubeRay operator
- I have deployed the Ray Cluster
I followed the getting started guide to create the serve.py
code. I omitted some code for brevity:
import numpy as np
import torch
from ray import serve
from starlette.requests import Request
from typing import Dict
@serve.deployment
class Predictor:
def __init__(self):
device = "cuda" if torch.cuda.is_available() else "cpu"
print("device: ", device)
self.model = torch.load(
"sbert_v0_epoch2_totaln40000.torch", map_location=device
).eval()
async def __call__(self, starlette_request: Request) -> Dict:
request = await starlette_request.json()
text = request["text"]
return text
predictor = Predictor.bind()
I have the following config file, created with serve build create_vectors:predictor -o predictor_config.yaml
:
# This file was generated using the `serve build` command on Ray v2.1.0.
import_path: create_vectors:predictor
runtime_env: {}
host: 0.0.0.0
port: 8000
deployments:
- name: Predictor
My Ray Cluster is defined as follows:
kind: RayCluster
metadata:
name: raycluster-complete
spec:
rayVersion: "2.0.0"
enableInTreeAutoscaling: true
headGroupSpec:
serviceType: LoadBalancer # Options are ClusterIP, NodePort, and LoadBalancer
enableIngress: false # Optional
rayStartParams:
block: "true"
dashboard-host: "0.0.0.0"
template: # Pod template
metadata: # Pod metadata
spec: # Pod spec
containers:
- name: ray-head
image: rayproject/ray-ml
# Keep this preStop hook in each Ray container config.
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","ray stop"]
ports: # Optional service port overrides
- containerPort: 6379
name: gcs
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
workerGroupSpecs:
- groupName: ray-worker
replicas: 0
minReplicas: 0
maxReplicas: 5
rayStartParams:
block: "true"
num-gpus: "1"
template: # Pod template
spec:
# Keep this initContainer in each workerGroup template.
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
containers:
- name: ray-node
image: rayproject/ray-ml:2.0.0-gpu
# Keep this preStop hook in each Ray container config.
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","ray stop"]
I think I should somehow incorporate my config file into the Ray Cluster definition, but I’m not sure how and whether it’s correct. How do I go continue from here?
Thanks!