Worker in pending state when running ray with minikube

Hello, I am trying to run ray with minikube for some tests, the workers that I am spawning are in a pending state, and below is the snippet of the log file, can you please help how to resolve the error:

======== Autoscaler status: 2021-04-12 08:19:50.427483 ========
Node status

Healthy:
1 head-node
Pending:
None: worker-node, waiting-for-ssh
None: worker-node, waiting-for-ssh
Recent failures:
(no failures)

Resources

Usage:
0.0/1.0 CPU
0.00/0.350 GiB memory
0.00/0.136 GiB object_store_memory

Demands:
(no resource demands)
example-cluster:2021-04-12 08:19:50,440 DEBUG legacy_info_string.py:24 – Cluster status: 2 nodes (2 updating)

  • MostDelayedHeartbeats: {β€˜172.17.0.5’: 0.1501305103302002}
  • NodeIdleSeconds: Min=41 Mean=41 Max=41
  • ResourceUsage: 0.0/1.0 CPU, 0.0 GiB/0.35 GiB memory, 0.0 GiB/0.14 GiB object_store_memory
  • TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
    Worker node types:
  • worker-node: 2
    example-cluster:2021-04-12 08:19:54,663 INFO command_runner.py:172 – NodeUpdater: example-cluster-ray-worker-5xm2k: Running kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’
    example-cluster:2021-04-12 08:19:54,672 INFO command_runner.py:172 – NodeUpdater: example-cluster-ray-worker-7gbrc: Running kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’
    Unable to use a TTY - input is not a terminal or the right kind of file
    Error from server (BadRequest): pod example-cluster-ray-worker-5xm2k does not have a host assigned
    2021-04-12 08:18:58,714 INFO commands.py:238 – Cluster: example-cluster
    2021-04-12 08:18:58,734 INFO commands.py:301 – Checking Kubernetes environment settings
    2021-04-12 08:18:58,764 INFO commands.py:573 – No head node found. Launching a new cluster. Confirm [y/N]: y [automatic, due to --yes]
    2021-04-12 08:18:58,765 INFO commands.py:618 – Acquiring an up-to-date head node
    2021-04-12 08:18:58,780 INFO commands.py:640 – Launched a new head node
    2021-04-12 08:18:58,780 INFO commands.py:644 – Fetching the new head node
    2021-04-12 08:18:58,787 INFO commands.py:663 – <1/1> Setting up head node
    2021-04-12 08:18:58,801 INFO updater.py:286 – New status: waiting-for-ssh
    2021-04-12 08:18:58,801 INFO updater.py:234 – [1/7] Waiting for SSH to become available
    2021-04-12 08:18:58,801 INFO updater.py:237 – Running uptime as a test.
    2021-04-12 08:18:58,970 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-head-npc4d – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:04,468 SUCC updater.py:245 – Success.
    2021-04-12 08:19:04,468 INFO log_timer.py:27 – NodeUpdater: example-cluster-ray-head-npc4d: Got remote shell [LogTimer=5667ms]
    2021-04-12 08:19:04,475 INFO updater.py:327 – Updating cluster configuration. [hash=c0249662bba0236e226fdb78e359c1ce573f3257]
    2021-04-12 08:19:04,489 INFO updater.py:331 – New status: syncing-files
    2021-04-12 08:19:04,489 INFO updater.py:212 – [2/7] Processing file mounts
    2021-04-12 08:19:04,489 INFO updater.py:229 – [3/7] No worker file mounts to sync
    2021-04-12 08:19:04,500 INFO updater.py:342 – New status: setting-up
    2021-04-12 08:19:04,500 INFO updater.py:380 – [4/7] No initialization commands to run.
    2021-04-12 08:19:04,500 INFO updater.py:384 – [5/7] Initalizing command runner
    2021-04-12 08:19:04,501 INFO updater.py:429 – [6/7] No setup commands to run.
    2021-04-12 08:19:04,501 INFO updater.py:433 – [7/7] Starting the Ray runtime
    2021-04-12 08:19:08,414 INFO log_timer.py:27 – NodeUpdater: example-cluster-ray-head-npc4d: Ray start commands succeeded [LogTimer=3913ms]
    2021-04-12 08:19:08,414 INFO log_timer.py:27 – NodeUpdater: example-cluster-ray-head-npc4d: Applied config c0249662bba0236e226fdb78e359c1ce573f3257 [LogTimer=9626ms]
    2021-04-12 08:19:08,435 INFO updater.py:161 – New status: up-to-date
    2021-04-12 08:19:08,438 INFO commands.py:742 – Useful commands
    2021-04-12 08:19:08,438 INFO commands.py:744 – Monitor autoscaling with
    2021-04-12 08:19:08,438 INFO commands.py:747 – ray exec /home/ray/ray_cluster_configs/example-cluster_config.yaml β€˜tail -n 100 -f /tmp/ray/session_latest/logs/monitor*’
    2021-04-12 08:19:08,438 INFO commands.py:749 – Connect to a terminal on the cluster head:
    2021-04-12 08:19:08,438 INFO commands.py:751 – ray attach /home/ray/ray_cluster_configs/example-cluster_config.yaml
    2021-04-12 08:19:08,438 INFO commands.py:754 – Get a remote shell to the cluster manually:
    2021-04-12 08:19:08,438 INFO commands.py:755 – kubectl -n ray exec -it example-cluster-ray-head-npc4d – bash
    2021-04-12 08:19:13,812 INFO updater.py:286 – New status: waiting-for-ssh
    2021-04-12 08:19:13,812 INFO updater.py:234 – [1/7] Waiting for SSH to become available
    2021-04-12 08:19:13,812 INFO updater.py:237 – Running uptime as a test.
    2021-04-12 08:19:13,816 INFO updater.py:286 – New status: waiting-for-ssh
    2021-04-12 08:19:13,816 INFO updater.py:234 – [1/7] Waiting for SSH to become available
    2021-04-12 08:19:13,816 INFO updater.py:237 – Running uptime as a test.
    2021-04-12 08:19:13,940 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:13,982 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:19,069 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:19,119 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:24,140 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:24,179 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:29,245 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:29,279 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:34,353 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:34,377 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:39,451 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:39,474 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:44,561 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:44,576 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:49,671 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-5xm2k – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    2021-04-12 08:19:49,684 INFO updater.py:277 – SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-worker-7gbrc – bash --login -c -i β€˜true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)’, retrying in 5 seconds.
    Unable to use a TTY - input is not a terminal or the right kind of file
    Error from server (BadRequest): pod example-cluster-ray-worker-7gbrc does not have a host assigned
    example-cluster:2021-04-12 08:19:55,556 DEBUG resource_demand_scheduler.py:158 – Cluster resources: [{β€˜object_store_memory’: 145933516.0, β€˜memory’: 375809638.0, β€˜CPU’: 1.0, β€˜node:172.17.0.5’: 1.0}, {β€˜CPU’: 1, β€˜bar’: 1, β€˜foo’: 1, β€˜memory’: 375809638}, {β€˜CPU’: 1, β€˜bar’: 1, β€˜foo’: 1, β€˜memory’: 375809638}]
    example-cluster:2021-04-12 08:19:55,556 DEBUG resource_demand_scheduler.py:159 – Node counts: defaultdict(<class β€˜int’>, {β€˜head-node’: 1, β€˜worker-node’: 2})
    example-cluster:2021-04-12 08:19:55,556 DEBUG resource_demand_scheduler.py:170 – Placement group demands: []
    example-cluster:2021-04-12 08:19:55,556 DEBUG resource_demand_scheduler.py:216 – Resource demands: []
    example-cluster:2021-04-12 08:19:55,557 DEBUG resource_demand_scheduler.py:217 – Unfulfilled demands: []
    example-cluster:2021-04-12 08:19:55,577 DEBUG resource_demand_scheduler.py:239 – Node requests: {}
    example-cluster:2021-04-12 08:19:55,605 INFO autoscaler.py:325 –
    ======== Autoscaler status: 2021-04-12 08:19:55.605627 ========
    Node status

Healthy:
1 head-node
Pending:
None: worker-node, waiting-for-ssh
None: worker-node, waiting-for-ssh
Recent failures:
(no failures)

Resources

Usage:
0.0/1.0 CPU
0.00/0.350 GiB memory
0.00/0.136 GiB object_store_memory

Demands:
(no resource demands)
example-cluster:2021-04-12 08:19:55,618 DEBUG legacy_info_string.py:24 – Cluster status: 2 nodes (2 updating)

  • MostDelayedHeartbeats: {β€˜172.17.0.5’: 0.15698885917663574}
  • NodeIdleSeconds: Min=47 Mean=47 Max=47
  • ResourceUsage: 0.0/1.0 CPU, 0.0 GiB/0.35 GiB memory, 0.0 GiB/0.14 GiB object_store_memory
  • TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
    Worker node types:
  • worker-node: 2
    $ kubectl -n ray get pods
    NAME READY STATUS RESTARTS AGE
    example-cluster-ray-head-npc4d 1/1 Running 0 4m42s
    example-cluster-ray-worker-5xm2k 0/1 Pending 0 4m32s
    example-cluster-ray-worker-7gbrc 0/1 Pending 0 4m32s
    ray-operator-pod 1/1 Running 3 18m

I think it might be due to low resources on my local laptop, I see this message in the log posted above:

Error from server (BadRequest): pod example-cluster-ray-worker-7gbrc does not have a host assigned

cc @Dmitri Can you address the question?

It does indeed look like there are insufficient resources to schedule the worker pods.

There’s a note on resource usage here which you might find helpful –
https://docs.ray.io/en/master/cluster/kubernetes.html#managing-clusters-with-the-ray-kubernetes-operator

1 Like