I was experimenting with Ray job scheduling and had this basic doubt. Consider this simple program:
_____________________________
@ray.remote
def f(x):
time.sleep(5)
return(x+1)
x = f.remote(1)
y = f.remote(2)
print(ray.get(x))
print(ray.get(y))
__________________________
I thought this program will take ~5 sec, because each of x and y will be an independently running worker. In which case both x and y will be available at the end of about 5 sec
However, it takes 10 sec, implying that first x is computed and only then computing y starts.
Is this correct interpretation? Does this mean Ray doesn’t spawn different threads for workers?
Also, any good reading to get a quick overview of such topics in Ray?
I ran the same script and it was returned within 5 seconds.
import ray
ray.init()
@ray.remote
def f(x):
time.sleep(5)
return(x+1)
import time
start = time.time()
x = f.remote(1)
y = f.remote(2)
print(ray.get(x))
print(ray.get(y))
print(time.time() - start)
How did you connect to the ray cluster? Did you use ray start and ray.init(address=‘auto’)?
Also, tasks are scheduled at each process. We don’t have a first-class threading support. (Also note that threading is not always achieving parallelism for Python due to GIL except some IO heavy workload or using numpy).
I realized that the remote VM I was working with had just 1 CPU. In replying to Archit, I didn’t realize that there were two nodes running ray on my cluster (Hence showing CPU:2). Sorry for the confusion.
The thing worked in 5 sec on running it on my local 4 core machine.
Yes, I understand threading isn’t really parallel. I was assuming that there was an abstraction of parallelism among workers irrespective of no of CPUs. Thank you for the clarification.
So for now, number of (apparently) parallel tasks is equal to number of CPUs in the cluster, right?
Yes that’s correct! But you can actually run more processes than number of cpus by just setting high num-cpus (num cpus don’t have to match to the real number of cpus). For actors, we technically support threading (with the argument max_concurrency), but it is not really recommended to use for Python (but for Java, we run many threads for each worker process).
For completeness: setting num-cpus seems to support threading.
On a one core machine:
On initialising with ray start --head --num-cpus=1, above program takes ~10sec
On initialising with ray start --head --num-cpus=2, above program takes ~5sec
Oh I think that’s because the process that calls sleep is not scheduled by cpu, (so that it looks like there’s a parallelism here). It is similar to scenario you have many IO ops with python multi threads.
@sangcho Thanks. Yea, that’s probably a bad example. I meant increasing num-cpus gives threading behaviour (may not be useful, but just for pedagogical reason:p). Here’s probably better example:
@ray.remote
def f(a):
print('entering...')
d = a*a
for i in range(50):
d = d*a
print('middle...')
for i in range(50):
d = d*a
return 0
a = np.random.rand(3000,3000)
start = time.time()
x = f.remote(a)
y = f.remote(a)
print(ray.get(x))
print(ray.get(y))
print(time.time() - start)