Fetching an object from remote memory

Hello! I am trying to figure out where most of the time is spent fetching an object from remote to local object store.

I have a Google cluster, and the bandwidth between the VMs is ~15 Gbps (measured with netperf). I want to send an object of let’s say 5MB between 2 ray nodes. The chunk size is also 5MB, meaning the object will be sent in one chunk. I measure most operations happening when reading a remote object (communication between Raylets, creating a new object in the local Plasma, etc.)

I cannot figure out what is the time to transfer the object between the 2 nodes. Given the 15Gbps network, the 5MB chunk should be transferred in ~3ms. However, my experiments show that it takes ~10ms from the sender initiating a “push” request, to the receiver getting the object and sending the reply back. I have excluded all work that is done on the receiver side (creating chunk, memcpy, sealing object). I understand that there is overhead from gRPC calls here, but I am not sure why I see this difference. Could someone please help?

Thank you!