maximize the parallelization efficiency using Python ray ActorPool?

raylei · October 18, 2022, 5:27pm

I recently started using ray ActorPool to parallelize my python code on my local computer (using the code below), and it’s definitely working. Specifically, I used it to process a list of arguments and return a list of results (Note that depending on the inputs, the “process” function could take different amounts of time).

However, while testing the script, it seems in this way the processes are sort of “blocking” each other, in that if there’s one process that takes a long time, it almost seems other cores would just stay more or less idle. Although it’s definitely not completely blocking, as running it this way still saves a lot of time compared to just running on one core, I found that many of the processors would just stay idle (more than half cores with <20% usage) despite I’m running this script on all cores (16 cores). This is especially observable when there is a long process, in which case there are only one or two cores that are actually active. Also, the total amount of time saved is nowhere near 16x

pool = ActorPool(actors)
poolmap = pool.map(
    lambda a, v: a.process.remote(arg),
    args,
)
result_list = [a for a in tqdm(poolmap, total=length)]

I suspect this is because the way I used to get the result values is not optimal (last line), but not sure how to make it better. Could you guys help me improve it?

raylei · October 27, 2022, 7:24pm

just want to bump it, really looking for help here

cade · October 31, 2022, 9:03pm

Hmm, looking through ray.util.actor_pool — Ray 2.0.1, it seems like the cores should be evenly used. Perhaps there is a bug somewhere. I’ll try to repro soon to better understand what’s going on.

raylei · November 1, 2022, 8:18pm

@cade Sounds good, Thank you so much!
just wanted to make sure, am I using the actor pool class right, especially the way of getting results?
and are there alternative ways to parallelize the process using ray that I should try?

raylei · November 15, 2022, 4:27pm

Hi @cade , how is it going? Just want to kindly check back with you in case there’s any update.
Please let me know if you need more information to better reproduce the bug
On the other hand, I would really appreciate it if you could point me to an alternative way of doing this, since I know the actor_pool class is about to be deprecated.

Thanks again!

Topic		Replies	Views
Actors pool - process stuck / tasks lost on a long run Ray Core	4	598	February 24, 2022
Low CPU utilization when compared to multiprocessing Ray Core	14	1492	June 1, 2023
Ray spawns too many actors	1	108	July 1, 2024
Ray ActorPool with 2 actors for Tensorflow resent-50 prediction is not performance better than single actor pool Ray Core	0	308	December 11, 2021
How to increase ray performance for cpu and io bound operations in a task Ray Core	9	971	August 9, 2021

maximize the parallelization efficiency using Python ray ActorPool?

Related topics