Unable to increase the number of cores for head node

blshao84 · June 10, 2022, 3:13pm

Hi, I’m running a ray cluster using k8s and try to assign more cpu cores to the head node. I use the official chart configurations for deployment and here’s my values.yaml

However, after the cluster is started, from the head node’s pod yaml configuration, the request CPU is some wired number that doesn’t appear anywhere in my configs.

As a result, there’s only one 1 core assigned to ray head.

I don’t know what else might control this config and hope people could shed some lights.

Thanks,
-BS

Dmitri · June 10, 2022, 3:22pm

That’s weird. I’ll take a look.

blshao84 · June 10, 2022, 3:24pm

Thanks Dmitri. Just for your information, I’m using ray 1.9 for now and please let me know if you need more information.

Dmitri · June 10, 2022, 3:26pm

Is the chart pulled from the Ray master branch?

Dmitri · June 10, 2022, 3:41pm

I was not able to reproduce this on my first attempt.

The chart’s logic sets limits equal to requests:

github.com

ray-project/ray/blob/7c39aa5facff1511a83c35a2c16704b45bfbaf5e/deploy/charts/ray/templates/raycluster.yaml#L67-L70

      
        
              cpu: {{ .CPU }}
              memory: {{ .memory }}
            limits:
              cpu: {{ .CPU }}

so this is very strange.

The Helm chart configures a “RayCluster” custom resource which is then processed by an operator. Maybe we can take a look at the intermediate RayCluster object first.

After installing the chart, could you kubectl -n <your namespace> get raycluster <your release name> -o yaml and see what the requests and limits look like in that configuration?

blshao84 · June 10, 2022, 3:44pm

'kubectl get raycluster -o yaml ’ shows the correct cpu ‘request’ and ‘limit’ (which is 2 in my case). Only the started head pod’s yaml somehow has a weird request number

blshao84 · June 10, 2022, 3:45pm

How should I take a look at ‘RayCluster’ object?

blshao84 · June 10, 2022, 3:47pm

Yes. We use the chart from ray-1.9.0 release.

I did a file by file diff (ray-1.9.0 v.s ours):

Dmitri · June 10, 2022, 3:49pm

One theory is that something in your K8s environment is mutating the requested pod.
What kind of K8s environment are you running in?

Thanks for the config details!

Dmitri · June 10, 2022, 3:52pm

The operator image should also be pinned to Ray 1.9.0. (operatorImage: rayproject/ray:1.9.0).
Is that the case in your configs?

Just as a sanity check, what happens if you try to create a scratch pod (say, with a busybox image),
with cpu requests=limits=2?

blshao84 · June 10, 2022, 4:03pm

double check with the operator image, it’s indeed ray:1.9.0. And a scratch pod seems running fine (with correct number of requested cpu) and we have a bunch of other services/pods running under this k8s platform.

Regarding to the k8s environment, it’s a vendor K8S platform, TKE: GitHub - tkestack/tke: Native Kubernetes container management platform supporting multi-tenant and multi-cluster

blshao84 · June 10, 2022, 4:04pm

Interestingly, we just tried set request cpu to 4, and in this case, the head pod’s request cpu becomes 1 … not 664m …

Dmitri · June 10, 2022, 7:13pm

I understand you might not be able to upgrade to a newer Ray for your application, but what happens if you use base Ray 1.13.0 images instead?

blshao84 · June 14, 2022, 1:37am

Thanks Dmitri, after some investigation, it turned out to be an issue of our k8s platform (they put some hard-coded limit on the cpu resource for test env) …

Appreciate your help anyway！！

Dmitri · June 14, 2022, 2:47am

Ok, thanks for letting me know. This is good for my sanity

Topic		Replies	Views
Ray Serve Pods Scheduling Failing Ray Serve	3	103	July 26, 2024
Questions for configurations using Helm Chart Kubernetes	3	682	November 19, 2022
Resource utilization for RayServe in Kubernetes (AKS) Kubernetes	4	532	June 24, 2022
Ray on k8s, how to properly config head node Ray Clusters	4	899	June 24, 2022
Autoscaler SDK request_resoures fails on EKS Kubernetes	8	584	February 16, 2021

Unable to increase the number of cores for head node

Related topics