We have multiple node types in our cluster configuration, and it scales down to one head node which is set to have 0 CPUs to prevent it from running workers.
When we start a task from our application that has the memory
option set, everything works fine. The cluster starts up a suitable node.
When we start a task without the memory
option set, it selects a node type (worker-node-cpu-highmem-8
) that is more costly than desired. As we set the memory
option on almost all our ray functions, this has not been a big issue for us. However, we would like to start using the ray-client (ray.init("ray://....."
). This causes an issue for us, because it causes the cluster to create a costly node, the same as for tasks without a memory requirement.
Is it possible to set a default node type such that the auto-scaling will choose that node type if the pending tasks do not have any memory requirements?
Could it also be possible to make a feature request for making the ray-client specify its memory requirements for the tasks it creates during startup?
Here is our ray config:
apiVersion: cluster.ray.io/v1
kind: RayCluster
metadata:
name: ray-cluster
spec:
# The maximum number of workers nodes to launch in addition to the head node.
maxWorkers: 50
# The autoscaler will scale up the cluster faster with higher upscaling speed.
# E.g., if the task requires adding more nodes then autoscaler will gradually
# scale up the cluster in chunks of upscaling_speed*currently_running_nodes.
# This number should be > 0.
upscalingSpeed: 10.0
# If a node is idle for this many minutes, it will be removed.
idleTimeoutMinutes: 10
# Specify the pod type for the ray head node (as configured below).
headPodType: head-node
# Optionally, configure ports for the Ray head service.
# The ports specified below are the defaults.
headServicePorts:
- name: client
port: 10001
targetPort: 10001
- name: dashboard
port: 8265
targetPort: 8265
- name: ray-serve
port: 8000
targetPort: 8000
- name: redis-primary
port: 6379
targetPort: 6379
# Specify the allowed pod types for this ray cluster and the resources they provide.
podTypes:
- name: head-node
# Minimum number of Ray workers of this Pod type.
minWorkers: 0
# Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
maxWorkers: 0
# Prevent tasks on head node. https://docs.ray.io/en/master/cluster/guide.html#configuring-the-head-node
rayResources: {"CPU": 0}
podConfig:
apiVersion: v1
kind: Pod
metadata:
# The operator automatically prepends the cluster name to this field.
generateName: ray-head-
spec:
tolerations:
- key: imerso-ray-head
operator: Equal
value: "true"
effect: NoSchedule
restartPolicy: Never
nodeSelector:
imerso-ray-head: "true"
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if is not a shared memory volume.
volumes:
- name: dshm
emptyDir:
medium: Memory
- name: filestore-ray
persistentVolumeClaim:
claimName: fileserver-ray-claim
readOnly: false
containers:
- name: ray-node
imagePullPolicy: Always
image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
# Do not change this command - it keeps the pod alive until it is
# explicitly killed.
command: ["/bin/bash", "-c", "--"]
args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
ports:
- containerPort: 6379 # Redis port
- containerPort: 10001 # Used by Ray Client
- containerPort: 8265 # Used by Ray Dashboard
- containerPort: 8000 # Used by Ray Serve
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if is not a shared memory volume.
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /filestore
name: filestore-ray
resources:
requests:
cpu: 1
memory: 5Gi
limits:
memory: 5Gi
- name: worker-node-cpu
# Minimum number of Ray workers of this Pod type.
minWorkers: 0
# Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
maxWorkers: 50
# User-specified custom resources for use by Ray.
# (Ray detects CPU and GPU from pod spec resource requests and limits, so no need to fill those here.)
# rayResources: {"example-resource-a": 1, "example-resource-b": 1}
podConfig:
apiVersion: v1
kind: Pod
metadata:
# The operator automatically prepends the cluster name to this field.
generateName: ray-worker-cpu-
spec:
tolerations:
- key: cloud.google.com/gke-preemptible
operator: Equal
value: "true"
effect: NoSchedule
- key: imerso-ray-worker
operator: Equal
value: "true"
effect: NoSchedule
serviceAccountName: ray-staging
restartPolicy: Never
volumes:
- name: dshm
emptyDir:
medium: Memory
- name: filestore-ray
persistentVolumeClaim:
claimName: fileserver-ray-claim
readOnly: false
containers:
- name: ray-node
imagePullPolicy: Always
image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
command: ["/bin/bash", "-c", "--"]
args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if is not a shared memory volume.
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /filestore
name: filestore-ray
resources:
requests:
cpu: 7
memory: 26G
limits:
memory: 26G
- name: worker-node-cpu-highmem-8
# Minimum number of Ray workers of this Pod type.
minWorkers: 0
# Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
maxWorkers: 5
# User-specified custom resources for use by Ray.
# (Ray detects CPU and GPU from pod spec resource requests and limits, so no need to fill those here.)
# rayResources: {"example-resource-a": 1, "example-resource-b": 1}
podConfig:
apiVersion: v1
kind: Pod
metadata:
# The operator automatically prepends the cluster name to this field.
generateName: ray-worker-cpu-highmem-8-
spec:
tolerations:
- key: cloud.google.com/gke-preemptible
operator: Equal
value: "true"
effect: NoSchedule
- key: imerso-ray-worker
operator: Equal
value: "true"
effect: NoSchedule
- key: imerso-ray-worker-highmem-8
operator: Equal
value: "true"
effect: NoSchedule
serviceAccountName: ray-staging
restartPolicy: Never
volumes:
- name: dshm
emptyDir:
medium: Memory
- name: filestore-ray
persistentVolumeClaim:
claimName: fileserver-ray-claim
readOnly: false
containers:
- name: ray-node
imagePullPolicy: Always
image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
command: ["/bin/bash", "-c", "--"]
args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if is not a shared memory volume.
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /filestore
name: filestore-ray
resources:
requests:
cpu: 7
memory: 60G
limits:
memory: 60G
- name: worker-node-cpu-highmem-16
# Minimum number of Ray workers of this Pod type.
minWorkers: 0
# Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
maxWorkers: 5
# User-specified custom resources for use by Ray.
# (Ray detects CPU and GPU from pod spec resource requests and limits, so no need to fill those here.)
# rayResources: {"example-resource-a": 1, "example-resource-b": 1}
podConfig:
apiVersion: v1
kind: Pod
metadata:
# The operator automatically prepends the cluster name to this field.
generateName: ray-worker-cpu-highmem-16-
spec:
tolerations:
- key: cloud.google.com/gke-preemptible
operator: Equal
value: "true"
effect: NoSchedule
- key: imerso-ray-worker
operator: Equal
value: "true"
effect: NoSchedule
- key: imerso-ray-worker-highmem-16
operator: Equal
value: "true"
effect: NoSchedule
serviceAccountName: ray-staging
restartPolicy: Never
volumes:
- name: dshm
emptyDir:
medium: Memory
- name: filestore-ray
persistentVolumeClaim:
claimName: fileserver-ray-claim
readOnly: false
containers:
- name: ray-node
imagePullPolicy: Always
image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
command: ["/bin/bash", "-c", "--"]
args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if is not a shared memory volume.
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /filestore
name: filestore-ray
resources:
requests:
cpu: 15
memory: 124G
limits:
memory: 124G
- name: worker-node-gpu
# Minimum number of Ray workers of this Pod type.
minWorkers: 0
# Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
maxWorkers: 20
# User-specified custom resources for use by Ray.
# (Ray detects CPU and GPU from pod spec resource requests and limits, so no need to fill those here.)
# rayResources: {"example-resource-a": 1, "example-resource-b": 1}
podConfig:
apiVersion: v1
kind: Pod
metadata:
# The operator automatically prepends the cluster name to this field.
generateName: ray-worker-gpu-
spec:
tolerations:
- key: cloud.google.com/gke-preemptible
operator: Equal
value: "true"
effect: NoSchedule
- key: imerso-ray-worker
operator: Equal
value: "true"
effect: NoSchedule
serviceAccountName: ray-staging
restartPolicy: Never
volumes:
- name: dshm
emptyDir:
medium: Memory
- name: filestore-ray
persistentVolumeClaim:
claimName: fileserver-ray-claim
readOnly: false
containers:
- name: ray-node
imagePullPolicy: Always
image: eu.gcr.io/imerso-3dscanner-backend/imerso-ray:${VERSION_TAG}
command: ["/bin/bash", "-c", "--"]
args: ["trap : TERM INT; touch /tmp/raylogs; tail -f /tmp/raylogs; sleep infinity & wait;"]
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if is not a shared memory volume.
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /filestore
name: filestore-ray
resources:
requests:
cpu: 7
memory: 26G
limits:
memory: 26G
nvidia.com/gpu: 1
# Commands to start Ray on the head node. You don't need to change this.
# Note dashboard-host is set to 0.0.0.0 so that Kubernetes can port forward.
headStartRayCommands:
- ray stop
- ulimit -n 65536; export AUTOSCALER_MAX_NUM_FAILURES=inf; ray start --head --num-cpus=0 --object-store-memory 1073741824 --no-monitor --dashboard-host 0.0.0.0 &> /tmp/raylogs
# Commands to start Ray on worker nodes. You don't need to change this.
workerStartRayCommands:
- ray stop
- ulimit -n 65536; ray start --object-store-memory 1073741824 --address=$RAY_HEAD_IP:6379 &> /tmp/raylogs