Worker in pending state when running ray with minikube

asm582 · April 12, 2021, 3:27pm

Hello, I am trying to run ray with minikube for some tests, the workers that I am spawning are in a pending state, and below is the snippet of the log file, can you please help how to resolve the error:

======== Autoscaler status: 2021-04-12 08:19:50.427483 ========
Node status

Healthy:
1 head-node
Pending:
None: worker-node, waiting-for-ssh
None: worker-node, waiting-for-ssh
Recent failures:
(no failures)

Resources

Usage:
0.0/1.0 CPU
0.00/0.350 GiB memory
0.00/0.136 GiB object_store_memory

Demands:
(no resource demands)
example-cluster:2021-04-12 08:19:50,440 DEBUG legacy_info_string.py:24 – Cluster status: 2 nodes (2 updating)

MostDelayedHeartbeats: {‘172.17.0.5’: 0.1501305103302002}
NodeIdleSeconds: Min=41 Mean=41 Max=41
ResourceUsage: 0.0/1.0 CPU, 0.0 GiB/0.35 GiB memory, 0.0 GiB/0.14 GiB object_store_memory
TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
Worker node types:
worker-node: 2
example-cluster:2021-04-12 08:19:54,663 INFO command_runner.py:172 – NodeUpdater: example-cluster-ray-worker-5xm2k: Running kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’
example-cluster:2021-04-12 08:19:54,672 INFO command_runner.py:172 – NodeUpdater: example-cluster-ray-worker-7gbrc: Running kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server (BadRequest): pod example-cluster-ray-worker-5xm2k does not have a host assigned
2021-04-12 08:18:58,714 INFO commands.py:238 – Cluster: example-cluster
2021-04-12 08:18:58,734 INFO commands.py:301 – Checking Kubernetes environment settings
2021-04-12 08:18:58,764 INFO commands.py:573 – No head node found. Launching a new cluster. Confirm [y/N]: y [automatic, due to --yes]
2021-04-12 08:18:58,765 INFO commands.py:618 – Acquiring an up-to-date head node
2021-04-12 08:18:58,780 INFO commands.py:640 – Launched a new head node
2021-04-12 08:18:58,780 INFO commands.py:644 – Fetching the new head node
2021-04-12 08:18:58,787 INFO commands.py:663 – <1/1> Setting up head node
2021-04-12 08:18:58,801 INFO updater.py:286 – New status: waiting-for-ssh
2021-04-12 08:18:58,801 INFO updater.py:234 – [1/7] Waiting for SSH to become available
2021-04-12 08:18:58,801 INFO updater.py:237 – Running uptime as a test.
2021-04-12 08:18:58,970 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-head-npc4d – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:04,468 SUCC updater.py:245 – Success.
2021-04-12 08:19:04,468 INFO log_timer.py:27 – NodeUpdater: example-cluster-ray-head-npc4d: Got remote shell [LogTimer=5667ms]
2021-04-12 08:19:04,475 INFO updater.py:327 – Updating cluster configuration. [hash=c0249662bba0236e226fdb78e359c1ce573f3257]
2021-04-12 08:19:04,489 INFO updater.py:331 – New status: syncing-files
2021-04-12 08:19:04,489 INFO updater.py:212 – [2/7] Processing file mounts
2021-04-12 08:19:04,489 INFO updater.py:229 – [3/7] No worker file mounts to sync
2021-04-12 08:19:04,500 INFO updater.py:342 – New status: setting-up
2021-04-12 08:19:04,500 INFO updater.py:380 – [4/7] No initialization commands to run.
2021-04-12 08:19:04,500 INFO updater.py:384 – [5/7] Initalizing command runner
2021-04-12 08:19:04,501 INFO updater.py:429 – [6/7] No setup commands to run.
2021-04-12 08:19:04,501 INFO updater.py:433 – [7/7] Starting the Ray runtime
2021-04-12 08:19:08,414 INFO log_timer.py:27 – NodeUpdater: example-cluster-ray-head-npc4d: Ray start commands succeeded [LogTimer=3913ms]
2021-04-12 08:19:08,414 INFO log_timer.py:27 – NodeUpdater: example-cluster-ray-head-npc4d: Applied config c0249662bba0236e226fdb78e359c1ce573f3257 [LogTimer=9626ms]
2021-04-12 08:19:08,435 INFO updater.py:161 – New status: up-to-date
2021-04-12 08:19:08,438 INFO commands.py:742 – Useful commands
2021-04-12 08:19:08,438 INFO commands.py:744 – Monitor autoscaling with
2021-04-12 08:19:08,438 INFO commands.py:747 – ray exec /home/ray/ray_cluster_configs/example-cluster_config.yaml ‘tail -n 100 -f /tmp/ray/session_latest/logs/monitor*’
2021-04-12 08:19:08,438 INFO commands.py:749 – Connect to a terminal on the cluster head:
2021-04-12 08:19:08,438 INFO commands.py:751 – ray attach /home/ray/ray_cluster_configs/example-cluster_config.yaml
2021-04-12 08:19:08,438 INFO commands.py:754 – Get a remote shell to the cluster manually:
2021-04-12 08:19:08,438 INFO commands.py:755 – kubectl -n ray exec -it example-cluster-ray-head-npc4d – bash
2021-04-12 08:19:13,812 INFO updater.py:286 – New status: waiting-for-ssh
2021-04-12 08:19:13,812 INFO updater.py:234 – [1/7] Waiting for SSH to become available
2021-04-12 08:19:13,812 INFO updater.py:237 – Running uptime as a test.
2021-04-12 08:19:13,816 INFO updater.py:286 – New status: waiting-for-ssh
2021-04-12 08:19:13,816 INFO updater.py:234 – [1/7] Waiting for SSH to become available
2021-04-12 08:19:13,816 INFO updater.py:237 – Running uptime as a test.
2021-04-12 08:19:13,940 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:13,982 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:19,069 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:19,119 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:24,140 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:24,179 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:29,245 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:29,279 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:34,353 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:34,377 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:39,451 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:39,474 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:44,561 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:44,576 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:49,671 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
2021-04-12 08:19:49,684 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i ‘true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server (BadRequest): pod example-cluster-ray-worker-7gbrc does not have a host assigned
example-cluster:2021-04-12 08:19:55,556 DEBUG resource_demand_scheduler.py:158 – Cluster resources: [{‘object_store_memory’: 145933516.0, ‘memory’: 375809638.0, ‘CPU’: 1.0, ‘node:172.17.0.5’: 1.0}, {‘CPU’: 1, ‘bar’: 1, ‘foo’: 1, ‘memory’: 375809638}, {‘CPU’: 1, ‘bar’: 1, ‘foo’: 1, ‘memory’: 375809638}]
example-cluster:2021-04-12 08:19:55,556 DEBUG resource_demand_scheduler.py:159 – Node counts: defaultdict(<class ‘int’>, {‘head-node’: 1, ‘worker-node’: 2})
example-cluster:2021-04-12 08:19:55,556 DEBUG resource_demand_scheduler.py:170 – Placement group demands: []
example-cluster:2021-04-12 08:19:55,556 DEBUG resource_demand_scheduler.py:216 – Resource demands: []
example-cluster:2021-04-12 08:19:55,557 DEBUG resource_demand_scheduler.py:217 – Unfulfilled demands: []
example-cluster:2021-04-12 08:19:55,577 DEBUG resource_demand_scheduler.py:239 – Node requests: {}
example-cluster:2021-04-12 08:19:55,605 INFO autoscaler.py:325 –
======== Autoscaler status: 2021-04-12 08:19:55.605627 ========
Node status

Healthy:
1 head-node
Pending:
None: worker-node, waiting-for-ssh
None: worker-node, waiting-for-ssh
Recent failures:
(no failures)

Resources

Usage:
0.0/1.0 CPU
0.00/0.350 GiB memory
0.00/0.136 GiB object_store_memory

Demands:
(no resource demands)
example-cluster:2021-04-12 08:19:55,618 DEBUG legacy_info_string.py:24 – Cluster status: 2 nodes (2 updating)

MostDelayedHeartbeats: {‘172.17.0.5’: 0.15698885917663574}
NodeIdleSeconds: Min=47 Mean=47 Max=47
ResourceUsage: 0.0/1.0 CPU, 0.0 GiB/0.35 GiB memory, 0.0 GiB/0.14 GiB object_store_memory
TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
Worker node types:
worker-node: 2
$ kubectl -n ray get pods
NAME READY STATUS RESTARTS AGE
example-cluster-ray-head-npc4d 1/1 Running 0 4m42s
example-cluster-ray-worker-5xm2k 0/1 Pending 0 4m32s
example-cluster-ray-worker-7gbrc 0/1 Pending 0 4m32s
ray-operator-pod 1/1 Running 3 18m

asm582 · April 12, 2021, 3:36pm

I think it might be due to low resources on my local laptop, I see this message in the log posted above:

Error from server (BadRequest): pod example-cluster-ray-worker-7gbrc does not have a host assigned

sangcho · April 13, 2021, 6:04pm

cc @Dmitri Can you address the question?

Dmitri · April 13, 2021, 11:49pm

It does indeed look like there are insufficient resources to schedule the worker pods.

There’s a note on resource usage here which you might find helpful –
https://docs.ray.io/en/master/cluster/kubernetes.html#managing-clusters-with-the-ray-kubernetes-operator

Topic		Replies	Views
Min_workers doesn't seem to be honored Kubernetes	15	862	February 27, 2021
Pending tasks not starting up Kubernetes	7	1462	May 13, 2022
Ray Train task stuck at .fit() with node's output in "PENDING" status when using a remote Kubernetes cluster	5	576	May 23, 2023
Autoscaling in minikube does not work? Kubernetes	3	475	April 18, 2021
Scale up from 0 Ray Clusters	7	564	July 15, 2021

Worker in pending state when running ray with minikube

======== Autoscaler status: 2021-04-12 08:19:50.427483 ======== Node status

Resources

Resources

Related topics

======== Autoscaler status: 2021-04-12 08:19:50.427483 ========
Node status