Launching or bringing down a bare metal cluster hangs indefinitely

alo · September 27, 2023, 9:47am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hello. I have the following cluster configuration:

cluster_name: distmat

provider:
    type: local
    head_ip: "192.168.128.129"
    worker_ips: ["192.168.128.211", "192.168.128.212", "192.168.128.213", "192.168.128.214", "192.168.128.215", "192.168.128.221", "192.168.128.222", "192.168.128.223", "192.168.128.224", "192.168.128.225"]
auth:
    ssh_user: <<user>>

upscaling_speed: 1.0

idle_timeout_minutes: 5

file_mounts_sync_continuously: False

rsync_exclude:
    - "**/.git"
    - "**/.git/**"

rsync_filter:
    - ".gitignore"

head_start_ray_commands:
    - ray stop
    - ulimit -c unlimited && ray start --head --port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml

worker_start_ray_commands:
    - ray stop
    - ray start --address=$RAY_HEAD_IP:6379

I’ve been facing an issue where running ray up cluster.yaml, with or without the --no-config-cache flag, or ray down cluster.yaml, results in the command hanging indefinitely until I manually terminate it with Ctrl+C. Strangely, this problem emerged unexpectedly, as the same setup used to function correctly without issues.

If you have any insights or solutions to this problem, I’d be glad if you’d share them.
Thanks

Topic		Replies	Views
Ray hangs in 2 different places, fails to launch anything on workers in ssh mode Ray Clusters	0	368	April 21, 2023
Ray cluster-launcher not starting up properly Ray Clusters	3	119	March 6, 2025
Launching Cluster on AWS hangs Ray Clusters	2	574	May 3, 2023
Ray Autoscalar failing to start Ray Clusters	2	686	December 6, 2023
Having issues launching ray cluster on windows Ray Clusters	0	130	May 17, 2024

Launching or bringing down a bare metal cluster hangs indefinitely

Related topics