How to fully utilize network bandwidth when getting data from remote nodes

mystoryfantacy · July 18, 2024, 9:20am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I have two nodes connected with InfiniBand. The network bandwidth is 25GB/s and iperf tool can achieve about 24GB/s.
I create a ray cluster with these two nodes connecting by IPoIB, one is head and the other is worker. Then I create 20 actors in head node and 100 tasks in worker node. Each actor has a big random numpy array (about 2.5GB) and each task is to get the numpy array from a remote actor. These tasks are executed concurrently. I expect the network speed should be about 20GB/s but the actual speed is 5GB/s .
I think it may be because the plasma object store or raylet configuration was not optimized (I used the default configuration). The iperf can acheive 24GB/s with 8 parallel connection. According to this fact I guess the network utilization will get improved If the gRPC concurrency used for transferring object increased, But I didn’t find any information about ray plasma configuration.
I want to fully utilize the network bandwidth with ray as the basic distributed programming framework. Can any one give me some guidance about this?

elpaul · April 25, 2025, 1:07pm

Any updates on this? I observed a similar behavior on a smaller scale: I have 2 VMs running in the cloud and iperf3 reports a bandwidth of roughly 1.95 GB/s. I have conducted some measurements to see whether I could saturate the bandwidth between both VMs using Ray with client submitting My code on a high level looks as follows. I have a Receiver Actor that receives a list of bytes and returns the first byte out of it, to make sure that the Object is transferred over the network and not just the ObjectRef. dummy_resource is just to ensure that the actor will run on the remote node to enforce that objects will be transferred over the network.
The client will invoke the actor with samples of different size. The total number of Bytes of all samples is always 1 GB. I measure the e2e latency it takes to send 1 GB of data to the actor and receive all responses.

@ray.remote(resources={"dummy_resource": 1.0})
class Receiver:
    def recv(self, a: bytes):
        return a[0]

def send(receiver: Receiver, samples: list[bytes], open_loop: bool):
    start_t = time.perf_counter()
    if open_loop:
        refs = [receiver.recv.remote(sample) for sample in samples]
        ray.get(refs)
    else:
        for sample in samples:
            ray.get(receiver.recv.remote(sample))
   end_t = time.perf_counter()
   lat_t = end_t - start_t

I chose sample sizes from 64KB to 64MB. Again the total amount of bytes sent is always 1 GB. However I cannot get beyond a bandwidth of 0.75 GB/s.
I also tried to make the client multithreaded, i.e equally partition the 1GB of that among all client threads, but I still cannot get beyond 0.75 GB/s.

Is this due to the overhead of object transfer over gRPC? I’m also running Ray in its default configuration.

Topic		Replies	Views
How to test the bandwidth using by Ray Ray Core	6	597	August 9, 2022
What use for data transferring? Ray Core	9	1111	December 2, 2022
[ray][k8s] Large network transmit in ray head node Ray Core	8	340	January 4, 2022
Benchmarking Ray's object storage for bandwidth and latency Ray Core	4	624	May 18, 2021
Fetching an object from remote memory Ray Core	0	303	June 1, 2021

How to fully utilize network bandwidth when getting data from remote nodes

Related topics