when i just copy this file (ray/example-full.yaml at master · ray-project/ray · GitHub) and change the value of min_workers (ray/example-full.yaml at 6af02913470fa9f394e0fe37b8a7115e10a29a2b · ray-project/ray · GitHub) to something >0 , ray doesn’t seem to be provisioning the corresponding number of workers and leave them running. in some cases no worker is available after the startup procedure.
When watching what is happening during startup, i can see that for a very short period of time a larger number of pods gets provisioned, but then deprovisioned just a few seconds after. below i included the logs. I tested this on kube 1.18.16, and ray 1.2.
Has anyone else seem this before?
2021-02-24 02:36:04,189 INFO autoscaler.py:203 -- StandardAutoscaler: example-cluster-ray-worker-9rxwz: Terminating outdated node.
2021-02-24 02:36:04,252 INFO autoscaler.py:203 -- StandardAutoscaler: example-cluster-ray-worker-ggx4d: Terminating outdated node.
2021-02-24 02:36:04,316 INFO autoscaler.py:203 -- StandardAutoscaler: example-cluster-ray-worker-n6v72: Terminating outdated node.
2021-02-24 02:36:04,376 INFO autoscaler.py:203 -- StandardAutoscaler: example-cluster-ray-worker-q5czz: Terminating outdated node.
2021-02-24 02:36:04,440 INFO autoscaler.py:203 -- StandardAutoscaler: example-cluster-ray-worker-wjrjj: Terminating outdated node.
2021-02-24 02:36:04,502 INFO autoscaler.py:203 -- StandardAutoscaler: example-cluster-ray-worker-wm7wf: Terminating outdated node.
2021-02-24 02:36:04,541 INFO node_provider.py:148 -- KubernetesNodeProvider: calling delete_namespaced_pod
2021-02-24 02:36:04,566 INFO node_provider.py:148 -- KubernetesNodeProvider: calling delete_namespaced_pod
2021-02-24 02:36:04,592 INFO node_provider.py:148 -- KubernetesNodeProvider: calling delete_namespaced_pod
2021-02-24 02:36:04,614 INFO node_provider.py:148 -- KubernetesNodeProvider: calling delete_namespaced_pod
2021-02-24 02:36:04,635 INFO node_provider.py:148 -- KubernetesNodeProvider: calling delete_namespaced_pod
2021-02-24 02:36:04,657 INFO node_provider.py:148 -- KubernetesNodeProvider: calling delete_namespaced_pod
2021-02-24 02:36:04,680 INFO node_provider.py:148 -- KubernetesNodeProvider: calling delete_namespaced_pod
2021-02-24 02:36:04,743 INFO autoscaler.py:221 -- StandardAutoscaler: example-cluster-ray-worker-xxsgg: Terminating unneeded node.
2021-02-24 02:36:04,757 INFO autoscaler.py:221 -- StandardAutoscaler: example-cluster-ray-worker-wm7wf: Terminating unneeded node.
2021-02-24 02:36:04,770 INFO autoscaler.py:221 -- StandardAutoscaler: example-cluster-ray-worker-wjrjj: Terminating unneeded node.
2021-02-24 02:36:04,782 INFO node_provider.py:148 -- KubernetesNodeProvider: calling delete_namespaced_pod
2021-02-24 02:36:04,806 INFO node_provider.py:148 -- KubernetesNodeProvider: calling delete_namespaced_pod
kubectl -n ray get pods
NAME READY STATUS RESTARTS AGE
example-cluster-ray-head-vs46h 1/1 Running 0 32s
example-cluster-ray-worker-24wzc 1/1 Terminating 0 7s
example-cluster-ray-worker-2rx2w 1/1 Terminating 0 7s
example-cluster-ray-worker-5jgvg 0/1 Terminating 0 7s
example-cluster-ray-worker-5rm7l 1/1 Terminating 0 7s
example-cluster-ray-worker-8l5dp 0/1 Terminating 0 7s
example-cluster-ray-worker-bgsgn 1/1 Terminating 0 7s
example-cluster-ray-worker-dmhrd 1/1 Terminating 0 7s
example-cluster-ray-worker-hbr6j 1/1 Terminating 0 7s
example-cluster-ray-worker-tcwz7 1/1 Terminating 0 7s
example-cluster-ray-worker-www88 1/1 Terminating 0 7s```
I couldn’t replicate the problem:
I tried modifying example-full to have available_node_types.worker_node.min_workers set to 1,
and running ray up with ray 1.2 and kubernetes 1.18.16 .
sure – it’s running on VMs with 8 CPUs and 32 GB of memory, the kube installation is setup from scratch.
Are there specific aspects you’d need? The container images are also pulled freshly.
As a reference, I included the cluster yaml exactly as I used it:
cluster_name: example-cluster
# The maximum number of workers nodes to launch in addition to the head
# node.
max_workers: 200
# The autoscaler will scale up the cluster faster with higher upscaling speed.
# E.g., if the task requires adding more nodes then autoscaler will gradually
# scale up the cluster in chunks of upscaling_speed*currently_running_nodes.
# This number should be > 0.
upscaling_speed: 1.0
# If a node is idle for this many minutes, it will be removed.
idle_timeout_minutes: 5
# Kubernetes resources that need to be configured for the autoscaler to be
# able to manage the Ray cluster. If any of the provided resources don't
# exist, the autoscaler will attempt to create them. If this fails, you may
# not have the required permissions and will have to request them to be
# created by your cluster administrator.
provider:
type: kubernetes
# Exposing external IP addresses for ray pods isn't currently supported.
use_internal_ips: true
# Namespace to use for all resources created.
namespace: ray
# ServiceAccount created by the autoscaler for the head node pod that it
# runs in. If this field isn't provided, the head pod config below must
# contain a user-created service account with the proper permissions.
autoscaler_service_account:
apiVersion: v1
kind: ServiceAccount
metadata:
name: autoscaler
# Role created by the autoscaler for the head node pod that it runs in.
# If this field isn't provided, the role referenced in
# autoscaler_role_binding must exist and have at least these permissions.
autoscaler_role:
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: autoscaler
rules:
- apiGroups: [""]
resources: ["pods", "pods/status", "pods/exec"]
verbs: ["get", "watch", "list", "create", "delete", "patch"]
# RoleBinding created by the autoscaler for the head node pod that it runs
# in. If this field isn't provided, the head pod config below must contain
# a user-created service account with the proper permissions.
autoscaler_role_binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: autoscaler
subjects:
- kind: ServiceAccount
name: autoscaler
roleRef:
kind: Role
name: autoscaler
apiGroup: rbac.authorization.k8s.io
services:
# Service that maps to the head node of the Ray cluster.
- apiVersion: v1
kind: Service
metadata:
# NOTE: If you're running multiple Ray clusters with services
# on one Kubernetes cluster, they must have unique service
# names.
name: example-cluster-ray-head
spec:
# This selector must match the head node pod's selector below.
selector:
component: example-cluster-ray-head
ports:
- name: client
protocol: TCP
port: 10001
targetPort: 10001
- name: dashboard
protocol: TCP
port: 8265
targetPort: 8265
# Specify the pod type for the ray head node (as configured below).
head_node_type: head_node
# Specify the allowed pod types for this ray cluster and the resources they provide.
available_node_types:
worker_node:
# Minimum number of Ray workers of this Pod type.
min_workers: 50
# Maximum number of Ray workers of this Pod type. Takes precedence over min_workers.
max_workers: 80
# User-specified custom resources for use by Ray. Object with string keys and integer values.
# (Ray detects CPU and GPU from pod spec resource requests and limits, so no need to fill those here.)
resources: {"foo": 1, "bar": 2}
node_config:
apiVersion: v1
kind: Pod
metadata:
# Automatically generates a name for the pod with this prefix.
generateName: example-cluster-ray-worker-
spec:
restartPolicy: Never
volumes:
- name: dshm
emptyDir:
medium: Memory
containers:
- name: ray-node
imagePullPolicy: Always
image: rayproject/ray:nightly
command: ["/bin/bash", "-c", "--"]
args: ["trap : TERM INT; sleep infinity & wait;"]
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if is not a shared memory volume.
volumeMounts:
- mountPath: /dev/shm
name: dshm
resources:
requests:
cpu: 1000m
memory: 512Mi
limits:
# The maximum memory that this pod is allowed to use. The
# limit will be detected by ray and split to use 10% for
# redis, 30% for the shared memory object store, and the
# rest for application memory. If this limit is not set and
# the object store size is not set manually, ray will
# allocate a very large object store in each pod that may
# cause problems for other pods.
memory: 512Mi
head_node:
node_config:
apiVersion: v1
kind: Pod
metadata:
# Automatically generates a name for the pod with this prefix.
generateName: example-cluster-ray-head-
# Must match the head node service selector above if a head node
# service is required.
labels:
component: example-cluster-ray-head
spec:
# Change this if you altered the autoscaler_service_account above
# or want to provide your own.
serviceAccountName: autoscaler
restartPolicy: Never
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if is not a shared memory volume.
volumes:
- name: dshm
emptyDir:
medium: Memory
containers:
- name: ray-node
imagePullPolicy: Always
image: rayproject/ray:nightly
# Do not change this command - it keeps the pod alive until it is
# explicitly killed.
command: ["/bin/bash", "-c", "--"]
args: ['trap : TERM INT; sleep infinity & wait;']
ports:
- containerPort: 6379 # Redis port
- containerPort: 10001 # Used by Ray Client
- containerPort: 8265 # Used by Ray Dashboard
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if is not a shared memory volume.
volumeMounts:
- mountPath: /dev/shm
name: dshm
resources:
requests:
cpu: 1000m
memory: 512Mi
limits:
# The maximum memory that this pod is allowed to use. The
# limit will be detected by ray and split to use 10% for
# redis, 30% for the shared memory object store, and the
# rest for application memory. If this limit is not set and
# the object store size is not set manually, ray will
# allocate a very large object store in each pod that may
# cause problems for other pods.
memory: 512Mi
# Command to start ray on the head node. You don't need to change this.
# Note dashboard-host is set to 0.0.0.0 so that kubernetes can port forward.
head_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --head --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host 0.0.0.0
# Command to start ray on worker nodes. You don't need to change this.
worker_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379```
i’ve been playing around with various min_worker settings and have been seeing inconsistent behavior, e.g when i set min_worker to 10, the system indeed ended up with 10 pods, but when i changed it to 30, I only ended up with one or no worker.
Is there any kind of reproducible set of instructions i can follow to make the min_worker settings work with higher numbers?
thanks a lot for the quick response. If it helps tracking down the root cause faster, i"ll be happy to have a quick web session to show you things live.
btw – as part of this exercise I think I stumbled over 2 more potential bugs – would be very interested in your take on them:
when i invoke a task many times in parallel, often times there is no provisioning of additional task instances although there is enough free capacity available.
when I annotate a task with `num_CPUs=1’, the system doesn’t seem to regard the capacity available on my kube cluster as being applicable for running instances of this. Only when I define a custom attribute ‘CPU’ in the cluster yaml, that capacity is acknowledged – while i think it should not be required for me to define a custom attribute at all for this.