I am trying to get ray running on a GKE cluster to run certain processes on my local machine (specifically, to open pygame windows on it for visualization of what’s running remotely). I figured out to set the raycluster config to loadbalancer so I can connect to the external ip from my laptop, and opened the 12345, 12346, and 52365 ports through the firewall (which I think I saw somewhere were the correct ports for gcs and logging? can someone confirm?).
When I run ray start --address=LoadManagerIp:6379 --node-ip-address=<laptop external IP from web>. I think my machine correctly connects to the Kubernetes cluster, as the dashboard from LoadManagerIp:8265 shows my laptop’s name as a worker, but I am unable to view its logs from the dashboard and running ray memory or ray status on the laptop gives the error:
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = ""
debug_error_string = "{"created":"@1666750368.505000000","description":"Error received from peer ipv4:<correct IP>:6379","file":"src/core/lib/surface/call.cc","file_line":953,"grpc_message":"","grpc_status":12}"
>
Ray cluster is not found at <correct IP>:6379
To test that the cluster would try to schedule on the local machine, I added a custom resource “display”:1 to ray start and made a test script that ran a remote function which required the display resources. Before my laptop connects, the ray head correctly outputs:
Error: No available node types can fulfill resource request {'Display': 1.0, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.
However, when my laptop connects, there’s just no output at all. I don’t see any new python processes on my computer, and it seems like no logs are being passed to the driver so I don’t know what’s going on. How can I see what’s passing between the head node and my machine, and does anyone know why it won’t run or other tests to try?