RayServe Autoscaling not creating Ray Pods

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am using RayServe to run inference on EKS cluster and deploying RayWorkers on AWS Neuron(Inferentia2) node. I start with same replica and minReplica in RayService YAML

    workerGroupSpecs:
    - groupName: inf2-worker-group
      replicas: 2
      minReplicas: 2
      maxReplicas: 8

Now, I also have configured RayServe replica under ServeConfig V2. I am able to run 2 inferentia2 worker nodes each of which have access to underlying neuron accelerators. However when I try to increase no of minReplicas I don’t see any Pending RayWorker Pods. In the head node logs, I see the following message
" 197512024-03-15 10:19:54,242 INFO autoscaler.py:469 – The autoscaler took 0.049 seconds to complete the update iteration.

197502024-03-15 10:19:54,242 WARNING resource_demand_scheduler.py:782 – The autoscaler could not find a node type to satisfy the request: [{‘CPU’: 10.0, ‘neuron_cores’: 2.0}, {‘CPU’: 10.0, ‘neuron_cores’: 2.0}, {‘CPU’: 10.0, ‘neuron_cores’: 2.0}]. Please specify a node type with the necessary resources.

19749 {‘CPU’: 10.0, ‘neuron_cores’: 2.0}: 3+ pending tasks/actors"

Here is snippet of RayService config.YAML

apiVersion: ray.io/v1

kind: RayService

metadata:

name: stablediffusion-service

spec:

serviceUnhealthySecondThreshold: 900

deploymentUnhealthySecondThreshold: 300

serveConfigV2: |

 applications:

 - name: stable-diffusion-deployment

 import_path: "ray_serve_stablediffusion:entrypoint"

 route_prefix: "/"

 runtime_env:

 env_vars:

 MODEL_ID: "aws-neuron/stable-diffusion-xl-base-1-0-1024x1024"

 NEURON_CC_FLAGS: "-O1"

 deployments:

 - name: stable-diffusion-v2

 autoscaling_config:

 metrics_interval_s: 0.2

 min_replicas: 15

 max_replicas: 20

 look_back_period_s: 2

 downscale_delay_s: 30

 upscale_delay_s: 2

 target_num_ongoing_requests_per_replica: 1

 graceful_shutdown_timeout_s: 5

 max_concurrent_queries: 100

 ray_actor_options:

 num_cpus: 10

 resources: {"neuron_cores": 2}

rayClusterConfig:

rayVersion: '2.9.0'

enableInTreeAutoscaling: true

headGroupSpec:

serviceType: NodePort

headService:

metadata:

name: stablediffusion-service

namespace: stablediffusion

rayStartParams:

dashboard-host: '0.0.0.0'

template:

spec:

containers:

 - name: ray-head

image: 

imagePullPolicy: Always # Ensure the image is always pulled when updated

lifecycle:

preStop:

exec:

command: ["/bin/sh", "-c", "ray stop"]
resources:

limits:

cpu: "2"

memory: "20G"

requests:

cpu: "2"

memory: "20G"

workerGroupSpecs:

 - groupName: inf2-worker-group

replicas: 2

minReplicas: 2

maxReplicas: 8

rayStartParams: {}

template:

spec:

containers:

 - name: ray-worker

image: 

imagePullPolicy: Always # Ensure the image is always pulled when updated

lifecycle:

preStop:

exec:

command: ["/bin/sh", "-c", "ray stop"]

resources:

limits:

cpu: "90" 

memory: "360G" 

aws.amazon.com/neuron: "6" 

requests:

cpu: "90" 

memory: "360G" 

aws.amazon.com/neuron: "6" # All Neuron cores of inf2.24xlarge

---

As a note, I’m using Karpenter as my cluster autoscaling solution on EKS. The problem is even if I increase the minReplica, Ray Actors are in “Pending” state with error “The autoscaler could not find a node type to satisfy the request: [{‘CPU’: 10.0, ‘neuron_cores’: 2.0}, {‘CPU’: 10.0, ‘neuron_cores’: 2.0}, {‘CPU’: 10.0, ‘neuron_cores’: 2.0}]. Please specify a node type with the necessary resources.” Shouldn’t Ray set the WorkerGroup replica accordingly once it sees the RayActors/Pods in pending state?

Could anyone please suggest ideas.

Are you using KubeRay + RayService or running Serve on top of EKS directly?

@Sam_Chan , I’m using KubeRay + RayService on EKS.

Could anyone please suggest what could be the issue here?
More details on this can be found here - [RayServe] Autoscaling Issue with Neuron Devices (Inf2), RayServe, and Karpenter on EKS · Issue #44361 · ray-project/ray · GitHub.
Thanks.