Cross-language doesn't work for a k8s cluster?

blshao84 · August 25, 2021, 12:25am

Let me first explain my use case. I have a Java application, which submit tasks to a Python Ray cluster running on K8s (following this link: Deploying on Kubernetes — Ray v2.0.0.dev0). In this case, the java app is really a “client” and I have solved all serialization issues. It works if I set up a local cluster but get confusing error messages when using the k8s one.

First of all, if I run my java app from my laptop, it throws below exception when calling Ray.init(). To me, it looks like my java app is treated as a worker node, but it’s really is just a client. I don’t find any other ‘client’ API I could use in Java like ‘ray.client().connect()’ in Python.

05:27:44.043 [DefaultDispatcher-worker-1] INFO  c.c.r.v.RemoteSingleAssetCmsLocalVolPA w/interface - sending request to grid for trade100
05:27:56.774 [main] ERROR i.r.runtime.DefaultRayRuntimeFactory - Failed to initialize ray runtime, with config {"ray":{"address":"10.23.113.84:6379","head-args":[],"job":{"code-search-path":"/home/ray/analytics-py-bct/bct/distributed/ray","id":"","jvm-options":[],"num-java-workers-per-process":1,"worker-env":{}},"logging":{"dir":"","level":"INFO","max-backup-files":10,"max-file-size":"500MB","pattern":"%d{yyyy-MM-dd HH:mm:ss,SSS} %p %c{1} [%t]: %m%n"},"object-store":{"socket-name":null},"raylet":{"node-manager-port":0,"socket-name":null},"redis":{"password":"5241590000000000"},"run-mode":"CLUSTER","session-dir":"/tmp/ray/session_2021-08-24_03-22-32_402612_114"}}
java.lang.RuntimeException: Failed to get address info. Output: null
        at io.ray.runtime.runner.RunManager.getAddressInfoAndFillConfig(RunManager.java:88)
        at io.ray.runtime.RayNativeRuntime.start(RayNativeRuntime.java:79)
        at io.ray.runtime.DefaultRayRuntimeFactory.createRayRuntime(DefaultRayRuntimeFactory.java:39)
        at io.ray.api.Ray.init(Ray.java:39)
        at io.ray.api.Ray.init(Ray.java:26)
        ...
Caused by: java.lang.RuntimeException: The exit value of the process is 1. Command: python -c import ray; print(ray._private.services.get_address_info_from_redis('10.23.113.84:6379', '10.42.9.133', redis_password='5241590000000000', log_warning=False))
output:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 310, in get_address_info_from_redis
    redis_address, node_ip_address, redis_password=redis_password)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 284, in get_address_info_from_redis_helper
    f"This node has an IP address of {node_ip_address}, and Ray "
RuntimeError: This node has an IP address of xxxx, and Ray expects this IP address to be either the Redis address or one of the Raylet addresses. Connected to Redis at 10.23.113.84:6379 and found raylets at ... but none of these match this node's IP 10.42.9.133. Are any of these actually a different IP address for the same node?You might need to provide --node-ip-address to specify the IP address that the head should use when sending to this node.

Then I also tried running my java app in one of worker Pod from my k8s cluster. In this case, it indeed is able to connect to the cluster by Ray.init() and sending tasks. But from what I observed from dashboard, all tasks (in my case hundreds of ) are all scheduled to the node my java app is running and it won’t take long that node is crashed due to out of memory.

I would say my use case is probably the most common one for a distributed computing scenario and it should be easily achieved. Would anyone shed some lights on how I should do this Ray? I could provide more detail regarding to my use case or the error, if needed.

Thanks,
-BS

Topic		Replies	Views
Connecting to remote Ray cluster on K8s Ray Clusters	7	2715	September 6, 2022
Can a client connect to ray cluster from a different network? Ray Clusters	0	357	August 24, 2021
Unable to send job to RayCluster from within K8s pod Kubernetes	0	213	October 10, 2023
Could not connect to socket - Kubernetes Ray Kubernetes	1	636	August 28, 2024
Connecting to Ray cluster on Kubernetes from outside the cluster Kubernetes	4	1670	March 23, 2021

Cross-language doesn't work for a k8s cluster?

Related topics