How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi Everyone, I am launching my Cluster using yaml file with a custom pem file and I provide like below, my cluster is launching perfectly fine however when head_node launches new worker_node it launches it with the custom pem file however tries to ssh it with the default ~/ray_bootstrap_key.pem
file. Because of this behavior Head node is not able to ssh to worker_node after it’s launched and submit jobs and throws below error.
2023-06-05 16:03:06,856 VVINFO command_runner.py:374 -- Full command is `ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_070dd72385/a1664639b9/%C -o ControlPersist=10s -o ConnectTimeout=10s ubuntu@10.0.1.213 bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'`
==> /tmp/ray/session_latest/logs/monitor.err <==
ssh: connect to host 10.0.1.100 port 22: Connection timed out
==> /tmp/ray/session_latest/logs/monitor.log <==
2023-06-05 16:03:07,216 INFO autoscaler.py:148 -- The autoscaler took 0.069 seconds to fetch the list of non-terminated nodes.
2023-06-05 16:03:07,216 INFO autoscaler.py:423 --
======== Autoscaler status: 2023-06-05 16:03:07.216757 ========
Node status
---------------------------------------------------------------
Healthy:
1 head_node_r5
Pending:
10.0.1.213: ray.worker.p2x, waiting-for-ssh
10.0.1.100: ray.worker.p2x, waiting-for-ssh
Recent failures:
(no failures)
How can I force ray auto_scaler to use the same custom pem file for ssh too?
I specify my custom key in parameter ssh_private_key and also specify in head_node and worker_node config in parameter KeyName