Hi, I used the following two commands to set up a ray cluster in the head node and the work nodes on GCP.
ulimit -n 65536; ray start --head --port=6379 --no-monitor --include-dashboard=Fasle
ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379
However, I found
ray.get is two times slower in the manually set cluster than that in k8s ray cluster. I used the ray’s heml chart to start the k8s ray cluster. Is there any trick to improve the speed of the manually set clusters? The specs on each cluster are the same.
ray.get is two times slower in the manually set cluster than that in k8s ray cluster
Can you share a reproduction script for this (For example, are you
ray.geting large object/the result of a function)?
The details of the networking setup could make a difference here.
For example, is it possible that in your K8s setup the head and worker were scheduled as pods on the same K8s node? That would reduce network overhead vs. a head and worker running on separate virtual machines.
Hi, @Dmitri, both k8s and GCP have the same setting. I have a data buffer running on the head node.
ray.get happens on the head node. I think maybe CPUs and RAM are weaker.