How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Dear all,
I am new to Ray but I have been playing with it for the last 2 weeks. I have some tasks that I want to parallelize among the servers of my company and Ray looks a great tool to do it. The only issue is that our best servers are running Windows and from my research I realized that Ray is not ready to run distributed tasks among different Windows computers yet.
So, to take benefit from the hardware I have on those servers I was thinking about using WSL (Windows Subsystem for Linux) to overcome the Windows limitations.
Because in Windows we have firewall issues, I decided to setup the Ray head on a pure Linux server we have. The idea of this server is just to manage the Ray cluster and the more demanding computational tasks will be done by Ray workers through the WSL.
I am using the Ray version 2.34.0 on all machines.
I successfully started the Ray head and on the dashboard everything seems great. Then, when I try to setup a node on a WSL the worker initially appears on my dashboard but after some seconds it is killed due to missing heartbeats.
I have tried a Linux-Linux configuration and everything works as expected. The worker is added to the cluster and the connection is stable.
For the WSL configuration, I have tried a couple of things already. I disabled the firewall but no success. With netsh, I added a routing rule to forward the messages on a range of ports (I tested starting the Ray head with the --worker-port-list option to define a list of ports) from my Windows machine to the WSL (the ip address used to start the Ray worker is the one from my Windows machine since this is the one visible by the network). But nothing is working…
At this stage, I have no idea what to try next and would be great to have some help from you guys.
Is there anything I am missing? I am a newbie on this networking stuff so there is a high chance I am doing some silly mistake…
Thanks for the help.
Andre