How `ray.wait()` works

xyzyx · August 21, 2022, 7:50am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.
wait() has an option fetch_local. When it sets to false, I think Ray just checks the object references are ready in the Ray cluster and Ray doesn’t transmit the data.
I try to follow the C++ code but I find it still Get() the objects. Here is something I find.

I find this because I try Ray in a cluster having IB and ethernet. I run the same script but the time about ray.wait() shows different. In my opinion, IB has better bandwidth so the time of ray.wait() will be short. But the results show different. The ethernet is faster. And I also timing the ray.get(). The time with IB is shorter which seems reasonable.

I try another script and get some other wired results. I run the script with two nodes which have 32 cores. So I can make sure both nodes running set().

import ray
import sys
import numpy as np
import time


@ray.remote(num_cpus=32)
def set(data):
    h=data.shape
    x=ray.put(np.ones(1024*1024*1024))
    time.sleep(15)
    return x

if __name__ == '__main__':

    # Start Ray.
    ray.init(address='auto')
    print(ray.cluster_resources())
    num_node =int(sys.argv[1])
    print(f'get time,wait time,remote time')
    for _ in range(10):
        remote_time=time.time()
        lix=[set.remote(np.ones(1024*1024*1024)) for _ in range(num_node)]
        wait_time=time.time()
        li=lix
        while len(li) !=0:
            done,li=ray.wait(li,fetch_local=False)
        start_get=time.time()
        l=ray.get(lix)
        end_get=time.time()
        print(end_get-start_get,start_get-wait_time,wait_time-remote_time,len(l))

here is the result,

with IB, the wait time is short but varies enormously. The remote time with IB is longer than the ethernet. The wait time and remote time with ethernet are long but smooth.

The results confused me and someone can explain this?

jjyao · August 22, 2022, 5:36am

In your code, ray.get(lix) returns a list of object ref (instead of actual np array) that’s generated by ray.put in set(). Is that what you want?

xyzyx · August 22, 2022, 5:41am

yes, I just want to put data in remote function and get object refs. I get the actual np array in another node to figure out how get() and wait() work.

Topic		Replies	Views
Ray.wait with fetch local= false isn't working properly Ray Core	1	27	April 16, 2025
Does polling `ray.wait([task_handle], timeout=0)` guarantee to eventually finish? Ray Core	11	650	April 27, 2023
Feature request: Allow ray.wait() to do the necessary work for an instant ray.get() Ray Core	16	423	May 25, 2021
Understanding the ray.get() method Ray Core	2	170	October 24, 2024
How to speed up ray.get() to get a large object from another node？ Ray Core	6	184	June 5, 2024

How `ray.wait()` works

Related topics