To add new GPU worker to existing CPU cluster

Balaji_MP · July 13, 2022, 4:41pm

Hello All, is there any possibilities to add a new GPU worker node to existing ray cluster with CPU ? The current deployment is on Openshift.

cade · July 13, 2022, 5:54pm

Hi @Balaji_MP! Can you share how you’re setting up your cluster? I haven’t used Openshift before, am curious how you specify what resources you need.

Balaji_MP · July 13, 2022, 6:49pm

Hello @cade, I used the recommended Helm chart: ray/deploy/charts/ray at master · ray-project/ray · GitHub for deployment and modified the values based on requirement. One key change, instead of latest image, used this one: ray:nightly-py39-cpu.

Dmitri · July 13, 2022, 6:57pm

Hi @Balaji_MP!
The most critical details for setting up pods that use GPUs will come from the Openshift docs.
I think this page should be useful Installing the NVIDIA GPU Operator — NVIDIA Cloud Native Technologies documentation

Once the Openshift GPU setup is complete, you should be able to add a new worker type that uses GPU:

github.com

ray-project/ray/blob/ab10890e908abbf50e33796e20f7dd5a3ac006e0/deploy/charts/ray/values.yaml#L85


      
                  #   ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
                  #   Note that it is often not necessary to manually specify tolerations for GPU
                  #   usage on managed platforms such as AKS, EKS, and GKE.
                  #   ref: https://docs.ray.io/en/master/cluster/kubernetes-gpu.html
                  tolerations: []
                  # - key: nvidia.com/gpu
                  #   operator: Exists
                  #   effect: NoSchedule
          
          
    # Optionally, define more worker podTypes
              # rayWorkerType2:
              #   minWorkers: 0
              #   maxWorkers: 10
              #   memory: ...
          
          
# Operator settings:
          
          
# operatorOnly - If true, will only set up the Operator with this release,
          # without launching a Ray cluster.
          operatorOnly: false
          # clusterOnly - If true, will only create a RayCluster resource with this release,

The key fields to set for the worker type would be GPU to indicate the number of GPUs,
possibly nodeSelector to make sure the pod is scheduled on the right node,
and possibly tolerations to tolerate a GPU taint.
nodeSelector and tolerations may or may not be necessary depending on the details of your GPU setup – but basically, you’d set whatever you need get a GPU-utilizing pod running in your environment.

Balaji_MP · July 13, 2022, 7:01pm

Thanks for your quick response. One question, is there any possibilities to provide different image for type2 worker ?

Balaji_MP · July 13, 2022, 7:28pm

Please ignore the above question, I deployed based on your suggestion and it worked. Thank you

Dmitri · July 13, 2022, 7:58pm

The above question is a good one!

For simplicity, we don’t currently support setting images per worker type in the Helm chart. However, it’s not too hard to edit the chart to support this functionality, by slightly modifying the RayCluster template.

Topic		Replies	Views
K8s Ray Specifying GPU/Node Type Kubernetes	2	777	April 18, 2023
3 workers but only 1 available	4	381	June 8, 2023
Specify workerPodType in Helm chart values.yaml Ray Clusters	4	724	June 17, 2022
Kuberay cluster not create worker pods after ray operator update to 1.1.0 Kubernetes	0	426	March 29, 2024
Worker node cannot be added Ray Core	3	420	December 13, 2022

To add new GPU worker to existing CPU cluster

Related topics