Resource utilization for RayServe in Kubernetes (AKS)

Trying to better understand resource utilization in Kubernetes cluster for Ray Serve deployments. Assume relatively lightweight CPU only deployments (e.g., {num_cpus: 1} per deployment).

How does the cluster config yaml specs for worker cpu resource limits interact with my @serve.deployment deployments? For example, if my cluster config says workers resource limits are CPU=4 does that mean that I can have 4 deployments per worker? At the 5th deployment(/deployment replica) does Ray spin up a new worker (assuming less than max workers)? Does each deployment “reserve” its “num_cpus” from the worker pool and hold on to them forever? How would this affect CPU utilization numbers from a kubernetes admin viewpoint?

I am asking because we are seeing high cpu consumption and I was wondering if perhaps I did not have optimal/best practice Cluster Config settings for my use case.


cc @Dmitri for cluster config question, @eoakes for serve questions

This sounds correct.

How would this affect CPU utilization numbers from a kubernetes admin viewpoint?

One deployment occupies one “logical” CPU, as tracked by the Ray scheduler. The actual CPU usage depends on what the deployment is actually doing.
Allocating 1 logical CPU to a Ray task/actor that actually uses significantly more than 1 core in terms of actual usage would cause problems. If you have 4 such over-active tasks/actors in a 4CPU K8s pod, K8s’s CPU throttling mechanisms would kick in.

Thanks @Dmitri all that information helps a lot. So you are saying we should be picky about getting that “num_cpus” on the ray side approximately correct to the work being done so that additional K8S workers/pods can be spun as/if needed?
Thanks again

Exactly. Underestimating num_cpus is bad for performance. Overestimating is safe but bad from a cost/utilization perspective.