How to fully utilize network bandwidth when getting data from remote nodes

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have two nodes connected with InfiniBand. The network bandwidth is 25GB/s and iperf tool can achieve about 24GB/s.
I create a ray cluster with these two nodes connecting by IPoIB, one is head and the other is worker. Then I create 20 actors in head node and 100 tasks in worker node. Each actor has a big random numpy array (about 2.5GB) and each task is to get the numpy array from a remote actor. These tasks are executed concurrently. I expect the network speed should be about 20GB/s but the actual speed is 5GB/s .
I think it may be because the plasma object store or raylet configuration was not optimized (I used the default configuration). The iperf can acheive 24GB/s with 8 parallel connection. According to this fact I guess the network utilization will get improved If the gRPC concurrency used for transferring object increased, But I didn’t find any information about ray plasma configuration.
I want to fully utilize the network bandwidth with ray as the basic distributed programming framework. Can any one give me some guidance about this?