Ray.get is slower in manually set cluster than that in k8s-based clusters

GoingMyWay · July 20, 2022, 1:04pm

Hi, I used the following two commands to set up a ray cluster in the head node and the work nodes on GCP.

ulimit -n 65536; ray start --head --port=6379 --no-monitor --include-dashboard=Fasle
ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379

However, I found ray.get is two times slower in the manually set cluster than that in k8s ray cluster. I used the ray’s heml chart to start the k8s ray cluster. Is there any trick to improve the speed of the manually set clusters? The specs on each cluster are the same.

ckw017 · July 22, 2022, 4:59pm

I found ray.get is two times slower in the manually set cluster than that in k8s ray cluster

Can you share a reproduction script for this (For example, are you ray.geting large object/the result of a function)?

cc @Dmitri

Dmitri · July 22, 2022, 5:37pm

The details of the networking setup could make a difference here.

For example, is it possible that in your K8s setup the head and worker were scheduled as pods on the same K8s node? That would reduce network overhead vs. a head and worker running on separate virtual machines.

GoingMyWay · July 24, 2022, 3:33pm

Hi, @Dmitri, both k8s and GCP have the same setting. I have a data buffer running on the head node. ray.get happens on the head node. I think maybe CPUs and RAM are weaker.

Topic		Replies	Views
Ray k8s cluster, communication is slow Ray Core	15	1023	June 18, 2022
[Cluster] [K8] Is the client.server automatically started in Ray 1.2.0 when running on K8? Kubernetes	1	954	April 18, 2021
Ray cluster uses only Head node Ray Clusters	3	445	June 28, 2021
[Clusters] [SGD] Cluster setup speed Ray Clusters	9	1184	April 12, 2021
[Cluster, Serve] Is it possible to configure cluster fault tolerance without `ray up`? Ray Clusters	0	158	January 11, 2024

Ray.get is slower in manually set cluster than that in k8s-based clusters

Related topics