How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
wait()
has an optionfetch_local
. When it sets to false, I think Ray just checks the object references are ready in the Ray cluster and Ray doesn’t transmit the data.
I try to follow the C++ code but I find it stillGet()
the objects. Here is something I find.
I find this because I try Ray in a cluster having IB and ethernet. I run the same script but the time about ray.wait()
shows different. In my opinion, IB has better bandwidth so the time of ray.wait()
will be short. But the results show different. The ethernet is faster. And I also timing the ray.get()
. The time with IB is shorter which seems reasonable.
I try another script and get some other wired results. I run the script with two nodes which have 32 cores. So I can make sure both nodes running set()
.
import ray
import sys
import numpy as np
import time
@ray.remote(num_cpus=32)
def set(data):
h=data.shape
x=ray.put(np.ones(1024*1024*1024))
time.sleep(15)
return x
if __name__ == '__main__':
# Start Ray.
ray.init(address='auto')
print(ray.cluster_resources())
num_node =int(sys.argv[1])
print(f'get time,wait time,remote time')
for _ in range(10):
remote_time=time.time()
lix=[set.remote(np.ones(1024*1024*1024)) for _ in range(num_node)]
wait_time=time.time()
li=lix
while len(li) !=0:
done,li=ray.wait(li,fetch_local=False)
start_get=time.time()
l=ray.get(lix)
end_get=time.time()
print(end_get-start_get,start_get-wait_time,wait_time-remote_time,len(l))
here is the result,
with IB, the wait time is short but varies enormously. The remote time with IB is longer than the ethernet. The wait time and remote time with ethernet are long but smooth.
The results confused me and someone can explain this?