Autoscaler not scaling up the worker node when using image rayproject/ray:1.11.0-py38

Susanta_Gautam · March 22, 2022, 4:28am

I have installed the Ray Cluster in EKS using Ray Operator. The operator image is rayproject/ray:1.11.0-py38.
I have installed with 2 setup. One with 1 head node and worker configured to be autoscaled with 0 to 50. In this setup I can see that no workers nodes are created and all the computation happens on the head node. Even when I set rayResource to 0, all the actors/jobs are in pending state. Same happens when the workers are set to be autoscaled from 1 to 50. No worker nodes are created. I checked the autoscaler logs and found the following logs.

Resources
---------------------------------------------------------------
Usage:
 0.0/2.0 CPU
 0.00/2.877 GiB memory
 0.00/0.673 GiB object_store_memory

Demands:
 {}: 2+ pending tasks/actors

But when I use the image rayproject/ray:v1.11.0, it works fine. But this image have the python 3.7 where as the application I am working on requires the python to be 3.8.x.

Alternatively, I have replaced the ray operator with kuberay but the issue is same. I used the following container spec and the results is same.

- name: autoscaler
            image: rayproject/ray:d3159f-py38
            imagePullPolicy: IfNotPresent
            env:
              - name: RAY_CLUSTER_NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
              - name: RAY_CLUSTER_NAME
                value: prediction-ray-cluster
            command: ["ray"]
            args:
              - "kuberay-autoscaler"
              - "--cluster-name"
              - "$(RAY_CLUSTER_NAME)"
              - "--cluster-namespace"
              - "$(RAY_CLUSTER_NAMESPACE)"
            resources:
                limits:
                  cpu: "500m"
                  memory: "1024Mi"
                requests:
                  cpu: "250m"
                  memory: "512Mi"
            volumeMounts:
              - mountPath: /tmp/ray
                name: ray-logs

Am I missing some configuration? I have tried this a lot but I am stuck on how to proceed.
Thank You

Ameer_Haj_Ali · April 25, 2022, 11:48am

@Alex , can you please help Susanta?

Alex · April 25, 2022, 3:08pm

@Susanta_Gautam I noticed your code snippet refers to an autoscaler node, are you also changing the container image name on head and worker nodes too?

Susanta_Gautam · July 2, 2022, 3:32am

Hi @Alex , Sorry for late reply.

Actually I got it working by defining the resource request for computation.

Thank You

Topic		Replies	Views
Autoscaler doesn't scale workers on K8s	5	689	February 15, 2021
Autoscaler SDK request_resoures fails on EKS Kubernetes	8	584	February 16, 2021
Autoscaling not working with ray.util.multiprocessing Kubernetes	5	775	June 17, 2021
[Autoscaler] Autoscaler on ray 1.3 with minikube does not scale down Ray Clusters	2	385	June 3, 2021
[Autoscaler] Autoscaler behavior for changes to min_workers for deployed cluster Ray Clusters	2	319	June 3, 2021

Autoscaler not scaling up the worker node when using image rayproject/ray:1.11.0-py38

Related topics