How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hello I am having issues connecting to the head node of a launched cluster on windows. After running ray up cluster.yaml -vvvv I am getting the following issue:
Launched a new head node
Fetching the new head node
<1/1> Setting up head node
Prepared bootstrap config
New status: waiting-for-ssh
[1/7] Waiting for SSH to become available
Running uptime
as a test.
Fetched IP: XX.XXX.XX.XXX
Running uptime
Full command is ssh -tt -i C:\Users\Mike/.ssh/ray-autoscaler_us-east-2.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=nul -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_d6ac022931/333d66ab16/%C -o ControlPersist=10s -o ConnectTimeout=10s ubuntu@XX.XXX.XX.XXX bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
getsockname failed: Not a socket
I see the head node running correctly in AWS but ray seems to be unable to connect to it. After looking into the issue I believe this is caused by the ControlMaster=auto -o ControlPath=/tmp/ray_ssh_d6ac022931/333d66ab16/%C -o ControlPersist=10 part of the SSH command, but I am not certain. Is there any simple way to remove this part of the command, or any advice on how to fix this?