How `ray.wait()` works

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.
    wait() has an option fetch_local. When it sets to false, I think Ray just checks the object references are ready in the Ray cluster and Ray doesn’t transmit the data.
    I try to follow the C++ code but I find it still Get() the objects. Here is something I find.

I find this because I try Ray in a cluster having IB and ethernet. I run the same script but the time about ray.wait() shows different. In my opinion, IB has better bandwidth so the time of ray.wait() will be short. But the results show different. The ethernet is faster. And I also timing the ray.get(). The time with IB is shorter which seems reasonable.

I try another script and get some other wired results. I run the script with two nodes which have 32 cores. So I can make sure both nodes running set().

import ray
import sys
import numpy as np
import time


@ray.remote(num_cpus=32)
def set(data):
    h=data.shape
    x=ray.put(np.ones(1024*1024*1024))
    time.sleep(15)
    return x

if __name__ == '__main__':

    # Start Ray.
    ray.init(address='auto')
    print(ray.cluster_resources())
    num_node =int(sys.argv[1])
    print(f'get time,wait time,remote time')
    for _ in range(10):
        remote_time=time.time()
        lix=[set.remote(np.ones(1024*1024*1024)) for _ in range(num_node)]
        wait_time=time.time()
        li=lix
        while len(li) !=0:
            done,li=ray.wait(li,fetch_local=False)
        start_get=time.time()
        l=ray.get(lix)
        end_get=time.time()
        print(end_get-start_get,start_get-wait_time,wait_time-remote_time,len(l))

here is the result,

with IB, the wait time is short but varies enormously. The remote time with IB is longer than the ethernet. The wait time and remote time with ethernet are long but smooth.

The results confused me and someone can explain this?

In your code, ray.get(lix) returns a list of object ref (instead of actual np array) that’s generated by ray.put in set(). Is that what you want?

yes, I just want to put data in remote function and get object refs. I get the actual np array in another node to figure out how get() and wait() work.