Trying to better understand resource utilization in Kubernetes cluster for Ray Serve deployments. Assume relatively lightweight CPU only deployments (e.g., {num_cpus: 1} per deployment).
How does the cluster config yaml specs for worker cpu resource limits interact with my @serve.deployment deployments? For example, if my cluster config says workers resource limits are CPU=4 does that mean that I can have 4 deployments per worker? At the 5th deployment(/deployment replica) does Ray spin up a new worker (assuming less than max workers)? Does each deployment “reserve” its “num_cpus” from the worker pool and hold on to them forever? How would this affect CPU utilization numbers from a kubernetes admin viewpoint?
I am asking because we are seeing high cpu consumption and I was wondering if perhaps I did not have optimal/best practice Cluster Config settings for my use case.
How would this affect CPU utilization numbers from a kubernetes admin viewpoint?
One deployment occupies one “logical” CPU, as tracked by the Ray scheduler. The actual CPU usage depends on what the deployment is actually doing.
Allocating 1 logical CPU to a Ray task/actor that actually uses significantly more than 1 core in terms of actual usage would cause problems. If you have 4 such over-active tasks/actors in a 4CPU K8s pod, K8s’s CPU throttling mechanisms would kick in.
Thanks @Dmitri all that information helps a lot. So you are saying we should be picky about getting that “num_cpus” on the ray side approximately correct to the work being done so that additional K8S workers/pods can be spun as/if needed?
Thanks again