How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
Hi Community,
We use ray helm chart ray/deploy/charts at master · ray-project/ray · GitHub to deploy Ray Cluster on Azure, but only ray-oprator and head-node pod were created, worker node disappeared, could anyone help us with this issue?
The detailed logs of ray operator can be found here: ray_operator_log.log - Google Drive
The main error says failed to connect to all addresses
@Rui I was not able to reproduce the problem with the default configuration on a local kind cluster.
It is possible that the issue is related to network settings in your Kubernetes cluster. The operator needs to make rpc requests to the Ray head node, which has a server listening by default at port 6379.
@Dmitri After checking, I found we set the Kubernetes cluster networkPolicy to default-deny-all mode, so I think we need to add a network policy to ray chart. Are there any docs about which ports and connection of operator, head and worker pods need to be open?