Is it possible to set a default node-type?

Birger · October 7, 2021, 8:24am

We have multiple node types in our cluster configuration, and it scales down to one head node which is set to have 0 CPUs to prevent it from running workers.

When we start a task from our application that has the memory option set, everything works fine. The cluster starts up a suitable node.
When we start a task without the memory option set, it selects a node type (worker-node-cpu-highmem-8) that is more costly than desired. As we set the memory option on almost all our ray functions, this has not been a big issue for us. However, we would like to start using the ray-client (ray.init("ray://....."). This causes an issue for us, because it causes the cluster to create a costly node, the same as for tasks without a memory requirement.

Is it possible to set a default node type such that the auto-scaling will choose that node type if the pending tasks do not have any memory requirements?

Could it also be possible to make a feature request for making the ray-client specify its memory requirements for the tasks it creates during startup?

Here is our ray config:

apiVersion: cluster.ray.io/v1
kind: RayCluster
metadata:
  name: ray-cluster
spec:
  # The maximum number of workers nodes to launch in addition to the head node.
  maxWorkers: 50
  # The autoscaler will scale up the cluster faster with higher upscaling speed.
  # E.g., if the task requires adding more nodes then autoscaler will gradually
  # scale up the cluster in chunks of upscaling_speed*currently_running_nodes.
  # This number should be > 0.
  upscalingSpeed: 10.0
  # If a node is idle for this many minutes, it will be removed.
  idleTimeoutMinutes: 10
  # Specify the pod type for the ray head node (as configured below).
  headPodType: head-node
  # Optionally, configure ports for the Ray head service.
  # The ports specified below are the defaults.
  headServicePorts:
    - name: client
      port: 10001
      targetPort: 10001
    - name: dashboard
      port: 8265
      targetPort: 8265
    - name: ray-serve
      port: 8000
      targetPort: 8000
    - name: redis-primary
      port: 6379
      targetPort: 6379
  # Specify the allowed pod types for this ray cluster and the resources they provide.
  podTypes:
  - name: head-node
    # Minimum number of Ray workers of this Pod type.
    minWorkers: 0
    # Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
    maxWorkers: 0
    # Prevent tasks on head node. https://docs.ray.io/en/master/cluster/guide.html#configuring-the-head-node
    rayResources: {"CPU": 0}
    podConfig:
      apiVersion: v1
      kind: Pod
      metadata:
        # The operator automatically prepends the cluster name to this field.
        generateName: ray-head-
      spec:
        tolerations:
        - key: imerso-ray-head
          operator: Equal
          value: "true"
          effect: NoSchedule
        restartPolicy: Never
        nodeSelector:
          imerso-ray-head: "true"

        # This volume allocates shared memory for Ray to use for its plasma
        # object store. If you do not provide this, Ray will fall back to
        # /tmp which cause slowdowns if is not a shared memory volume.
        volumes:
        - name: dshm
          emptyDir:
            medium: Memory
        - name: filestore-ray
          persistentVolumeClaim:
            claimName: fileserver-ray-claim
            readOnly: false
        containers:
        - name: ray-node
          imagePullPolicy: Always
          image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
          # Do not change this command - it keeps the pod alive until it is
          # explicitly killed.
          command: ["/bin/bash", "-c", "--"]
          args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
          ports:
          - containerPort: 6379  # Redis port
          - containerPort: 10001  # Used by Ray Client
          - containerPort: 8265  # Used by Ray Dashboard
          - containerPort: 8000 # Used by Ray Serve

          # This volume allocates shared memory for Ray to use for its plasma
          # object store. If you do not provide this, Ray will fall back to
          # /tmp which cause slowdowns if is not a shared memory volume.
          volumeMounts:
          - mountPath: /dev/shm
            name: dshm
          - mountPath: /filestore
            name: filestore-ray
          resources:
            requests:
              cpu: 1
              memory: 5Gi
            limits:
              memory: 5Gi
  - name: worker-node-cpu
    # Minimum number of Ray workers of this Pod type.
    minWorkers: 0
    # Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
    maxWorkers: 50
    # User-specified custom resources for use by Ray.
    # (Ray detects CPU and GPU from pod spec resource requests and limits, so no need to fill those here.)
    # rayResources: {"example-resource-a": 1, "example-resource-b": 1}
    podConfig:
      apiVersion: v1
      kind: Pod
      metadata:
        # The operator automatically prepends the cluster name to this field.
        generateName: ray-worker-cpu-
      spec:
        tolerations:
        - key: cloud.google.com/gke-preemptible
          operator: Equal
          value: "true"
          effect: NoSchedule
        - key: imerso-ray-worker
          operator: Equal
          value: "true"
          effect: NoSchedule
        serviceAccountName: ray-staging
        restartPolicy: Never
        volumes:
        - name: dshm
          emptyDir:
            medium: Memory
        - name: filestore-ray
          persistentVolumeClaim:
            claimName: fileserver-ray-claim
            readOnly: false
        containers:
        - name: ray-node
          imagePullPolicy: Always
          image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
          command: ["/bin/bash", "-c", "--"]
          args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
          # This volume allocates shared memory for Ray to use for its plasma
          # object store. If you do not provide this, Ray will fall back to
          # /tmp which cause slowdowns if is not a shared memory volume.
          volumeMounts:
          - mountPath: /dev/shm
            name: dshm
          - mountPath: /filestore
            name: filestore-ray
          resources:
            requests:
              cpu: 7
              memory: 26G
            limits:
              memory: 26G
  - name: worker-node-cpu-highmem-8
    # Minimum number of Ray workers of this Pod type.
    minWorkers: 0
    # Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
    maxWorkers: 5
    # User-specified custom resources for use by Ray.
    # (Ray detects CPU and GPU from pod spec resource requests and limits, so no need to fill those here.)
    # rayResources: {"example-resource-a": 1, "example-resource-b": 1}
    podConfig:
      apiVersion: v1
      kind: Pod
      metadata:
        # The operator automatically prepends the cluster name to this field.
        generateName: ray-worker-cpu-highmem-8-
      spec:
        tolerations:
        - key: cloud.google.com/gke-preemptible
          operator: Equal
          value: "true"
          effect: NoSchedule
        - key: imerso-ray-worker
          operator: Equal
          value: "true"
          effect: NoSchedule
        - key: imerso-ray-worker-highmem-8
          operator: Equal
          value: "true"
          effect: NoSchedule
        serviceAccountName: ray-staging
        restartPolicy: Never
        volumes:
        - name: dshm
          emptyDir:
            medium: Memory
        - name: filestore-ray
          persistentVolumeClaim:
            claimName: fileserver-ray-claim
            readOnly: false
        containers:
        - name: ray-node
          imagePullPolicy: Always
          image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
          command: ["/bin/bash", "-c", "--"]
          args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
          # This volume allocates shared memory for Ray to use for its plasma
          # object store. If you do not provide this, Ray will fall back to
          # /tmp which cause slowdowns if is not a shared memory volume.
          volumeMounts:
          - mountPath: /dev/shm
            name: dshm
          - mountPath: /filestore
            name: filestore-ray
          resources:
            requests:
              cpu: 7
              memory: 60G
            limits:
              memory: 60G
  - name: worker-node-cpu-highmem-16
    # Minimum number of Ray workers of this Pod type.
    minWorkers: 0
    # Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
    maxWorkers: 5
    # User-specified custom resources for use by Ray.
    # (Ray detects CPU and GPU from pod spec resource requests and limits, so no need to fill those here.)
    # rayResources: {"example-resource-a": 1, "example-resource-b": 1}
    podConfig:
      apiVersion: v1
      kind: Pod
      metadata:
        # The operator automatically prepends the cluster name to this field.
        generateName: ray-worker-cpu-highmem-16-
      spec:
        tolerations:
        - key: cloud.google.com/gke-preemptible
          operator: Equal
          value: "true"
          effect: NoSchedule
        - key: imerso-ray-worker
          operator: Equal
          value: "true"
          effect: NoSchedule
        - key: imerso-ray-worker-highmem-16
          operator: Equal
          value: "true"
          effect: NoSchedule
        serviceAccountName: ray-staging
        restartPolicy: Never
        volumes:
        - name: dshm
          emptyDir:
            medium: Memory
        - name: filestore-ray
          persistentVolumeClaim:
            claimName: fileserver-ray-claim
            readOnly: false
        containers:
        - name: ray-node
          imagePullPolicy: Always
          image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
          command: ["/bin/bash", "-c", "--"]
          args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
          # This volume allocates shared memory for Ray to use for its plasma
          # object store. If you do not provide this, Ray will fall back to
          # /tmp which cause slowdowns if is not a shared memory volume.
          volumeMounts:
          - mountPath: /dev/shm
            name: dshm
          - mountPath: /filestore
            name: filestore-ray
          resources:
            requests:
              cpu: 15
              memory: 124G
            limits:
              memory: 124G
  - name: worker-node-gpu
    # Minimum number of Ray workers of this Pod type.
    minWorkers: 0
    # Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
    maxWorkers: 20
    # User-specified custom resources for use by Ray.
    # (Ray detects CPU and GPU from pod spec resource requests and limits, so no need to fill those here.)
    # rayResources: {"example-resource-a": 1, "example-resource-b": 1}
    podConfig:
      apiVersion: v1
      kind: Pod
      metadata:
        # The operator automatically prepends the cluster name to this field.
        generateName: ray-worker-gpu-
      spec:
        tolerations:
        - key: cloud.google.com/gke-preemptible
          operator: Equal
          value: "true"
          effect: NoSchedule
        - key: imerso-ray-worker
          operator: Equal
          value: "true"
          effect: NoSchedule
        serviceAccountName: ray-staging
        restartPolicy: Never
        volumes:
        - name: dshm
          emptyDir:
            medium: Memory
        - name: filestore-ray
          persistentVolumeClaim:
            claimName: fileserver-ray-claim
            readOnly: false
        containers:
        - name: ray-node
          imagePullPolicy: Always
          image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
          command: ["/bin/bash", "-c", "--"]
          args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
          # This volume allocates shared memory for Ray to use for its plasma
          # object store. If you do not provide this, Ray will fall back to
          # /tmp which cause slowdowns if is not a shared memory volume.
          volumeMounts:
          - mountPath: /dev/shm
            name: dshm
          - mountPath: /filestore
            name: filestore-ray
          resources:
            requests:
              cpu: 7
              memory: 26G
            limits:
              memory: 26G
              nvidia.com/gpu: 1
  # Commands to start Ray on the head node. You don't need to change this.
  # Note dashboard-host is set to 0.0.0.0 so that Kubernetes can port forward.
  headStartRayCommands:
    - ray stop
    - ulimit -n 65536; export AUTOSCALER_MAX_NUM_FAILURES=inf; ray start --head --num-cpus=0 --object-store-memory 1073741824 --no-monitor --dashboard-host 0.0.0.0 &> /tmp/raylogs
  # Commands to start Ray on worker nodes. You don't need to change this.
  workerStartRayCommands:
    - ray stop
    - ulimit -n 65536; ray start --object-store-memory 1073741824 --address=$RAY_HEAD_IP:6379 &> /tmp/raylogs

Birger · October 7, 2021, 10:11am

This might not be the best solution, but we will be using this patch until there is a better solution:

--- /usr/local/lib/python3.8/dist-packages/ray/client_builder.py
+++ /usr/local/lib/python3.8/dist-packages/ray/client_builder.py
@@ -142,7 +142,8 @@ class ClientBuilder:
             _credentials=self._credentials,
             ray_init_kwargs=self._remote_init_kwargs)
         dashboard_url = ray.get(
-            ray.remote(ray.worker.get_dashboard_url).remote())
+            ray.remote(memory=200 * 1024**2)(
+                ray.worker.get_dashboard_url).remote())
         cxt = ClientContext(
             dashboard_url=dashboard_url,
             python_version=client_info_dict["python_version"],

Topic		Replies	Views
Is there anything special about .default at the end of an available_node_type? Ray Clusters	1	11	August 18, 2024
Why the KubeRay disable the autoscaling in default? Kubernetes	8	926	October 8, 2022
Autoscaler does not scale in ray1.4 with 0 CPUs allocated head node Kubernetes	1	471	July 27, 2021
RayOutOfMemoryError: Why is autoscaler not creating new pods? Kubernetes	3	955	April 28, 2022
Ray Serve Pods Scheduling Failing Ray Serve	3	101	July 26, 2024

Is it possible to set a default node-type?

Related topics