Ray on single machine. No threading?

Hi,

I was experimenting with Ray job scheduling and had this basic doubt. Consider this simple program:

_____________________________
@ray.remote
def f(x):
    time.sleep(5)
    return(x+1)

x = f.remote(1)
y = f.remote(2)

print(ray.get(x))
print(ray.get(y))

__________________________

I thought this program will take ~5 sec, because each of x and y will be an independently running worker. In which case both x and y will be available at the end of about 5 sec

However, it takes 10 sec, implying that first x is computed and only then computing y starts.

Is this correct interpretation? Does this mean Ray doesn’t spawn different threads for workers?

Also, any good reading to get a quick overview of such topics in Ray?

Thanks,
Chirag

Hi @Chirag ,

That’s odd, your code sample should run in 5 seconds. If you run ray.available_resources(), do you see at least two CPUs available?

Hi @architkulkarni , thanks for the reply.

Yes it does. It shows - ‘CPU’: 2.0

As an aside, does number of cores matter? Can we not have threads like in python threading?

Thanks,
Chirag

I ran the same script and it was returned within 5 seconds.

import ray
ray.init()
@ray.remote
def f(x):
    time.sleep(5)
    return(x+1)

import time
start = time.time()
x = f.remote(1)
y = f.remote(2)

print(ray.get(x))
print(ray.get(y))
print(time.time() - start)

How did you connect to the ray cluster? Did you use ray start and ray.init(address=‘auto’)?

Also, tasks are scheduled at each process. We don’t have a first-class threading support. (Also note that threading is not always achieving parallelism for Python due to GIL except some IO heavy workload or using numpy).

If threading support is important for your use case, please create a feature request to our Github page!

Thank you @sangcho !

I realized that the remote VM I was working with had just 1 CPU. In replying to Archit, I didn’t realize that there were two nodes running ray on my cluster (Hence showing CPU:2). Sorry for the confusion.
The thing worked in 5 sec on running it on my local 4 core machine.

Yes, I understand threading isn’t really parallel. I was assuming that there was an abstraction of parallelism among workers irrespective of no of CPUs. Thank you for the clarification.

So for now, number of (apparently) parallel tasks is equal to number of CPUs in the cluster, right?

Thanks,

1 Like

Yes that’s correct! But you can actually run more processes than number of cpus by just setting high num-cpus (num cpus don’t have to match to the real number of cpus). For actors, we technically support threading (with the argument max_concurrency), but it is not really recommended to use for Python (but for Java, we run many threads for each worker process).

1 Like

Got it! Thanks for the prompt and detailed reply @sangcho !

For completeness: setting num-cpus seems to support threading.

On a one core machine:

On initialising with ray start --head --num-cpus=1, above program takes ~10sec
On initialising with ray start --head --num-cpus=2, above program takes ~5sec

Oh I think that’s because the process that calls sleep is not scheduled by cpu, (so that it looks like there’s a parallelism here). It is similar to scenario you have many IO ops with python multi threads.

1 Like

@sangcho Thanks. Yea, that’s probably a bad example. I meant increasing num-cpus gives threading behaviour (may not be useful, but just for pedagogical reason:p). Here’s probably better example:

@ray.remote
def f(a):
    print('entering...')
    d = a*a
    for i in range(50):
    	d = d*a	
    print('middle...')
    for i in range(50):
    	d = d*a
    return 0

a = np.random.rand(3000,3000)

start = time.time()
x = f.remote(a)
y = f.remote(a)

print(ray.get(x))
print(ray.get(y))
print(time.time() - start)

Output with 1 single core node and num-cpus=1
Screenshot 2021-04-02 at 12.28.21 PM

Output with 1 single core node and num-cpus=2
Screenshot 2021-04-02 at 12.28.50 PM

Output with 2 single core nodes:

Thanks.