The use of python multiprocessing along with Ray

YarShev · December 14, 2021, 11:31am

Hi, I’d like to find out if there is any issue with the use of python multiprocessing along with Ray (nested parallelism) regarding performance, serializing/deserializing objects, etc.?

import ray
ray.init()

import pandas
df = pandas.DataFrame([1])
o_ref = ray.put(df)

def f(obj):
    local_df = ray.get(o_ref)
    local_df += obj

from multiprocessing import Pool

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1]))

I’d be appreciate if you could point me to some docs on this if those exist.

Thanks in advance!

YarShev · December 14, 2021, 11:32am

@sangcho, any comments/thoughts?

ericl · December 15, 2021, 12:04am

The normal Python multiprocessing will cause conflicts with nesting / excessive resource use (i.e., launching duplicated Ray clusters on the same machine), but you can use the Ray-integrated multiprocessing instead: Distributed multiprocessing.Pool — Ray v1.9.0

YarShev · December 15, 2021, 7:20am

I don’t quite get it. What do you mean by duplicated Ray clusters? Will we have multiple driver processes that is not acceptable?

ericl · December 15, 2021, 5:43pm

Yes, Ray should be managing all the parallelism of your program. Trying to mix processes with Ray is an anti pattern. Instead use Ray tasks for parallelism, or the ray multiprocessing library.

YarShev · December 15, 2021, 5:57pm

@ericl, thanks a lot! Btw, did you encounter any issue with the use of python multiprocessing along with Ray? Why am I asking is because the example I put works without any error.

YarShev · January 10, 2022, 3:39pm

@ericl, just a friendly reminder.

ericl · January 10, 2022, 7:04pm

@YarShev, Ray is designed to manage the entire distributed application. If your application is launching multiple Ray clusters internally, that’s quite strange and can cause issues like running out of memory, not to mention the Ray clusters won’t be able to communicate with each other.

YarShev · January 10, 2022, 7:31pm

@ericl, thank you for the answer!

Topic		Replies	Views
Nested `multiprocessing.Pool` on a distributed Ray cluster Ray Core	1	173	April 18, 2024
Substitution of multiprecessing.Manager in Ray? Ray Core	1	244	February 2, 2021
Utilising Ray for Simple Parallelism (Batch Inference)	1	923	March 28, 2023
Low CPU utilization when compared to multiprocessing Ray Core	14	1582	June 1, 2023
Can I use ray for image processing in python	1	1171	May 2, 2022

The use of python multiprocessing along with Ray

Related topics