Worker initializer in ray.util.multiprocessing.Pool

Yoav · February 15, 2021, 8:40pm

I have an expensive initialization that I want each worker to perform before it starts processing tasks. The resulting object should then be accessible to all tasks on that worker.
With Actors, it is quite straightforward, I initialize it in the constructor.

However, I am not clear on how to do it with tasks, and how to do it with the ray.util.multiprocessing.Pool.

Specifically, the initializer in the Pool API is not returning any value. The pattern I know from the python Pool is to assign to a global variable, which is then accessible also from the worker task. However, this does not seem to work with the ray Pool. What is the intended usage pattern for initializers in the ray multiprocessing Pool?

sangcho · February 15, 2021, 10:38pm

cc @eoakes do you know how we can achieve this? This seems to be a common use case of our multi processing pool?

Alex · February 15, 2021, 10:42pm

2 things come to mind

In general, if you want global state, you can wrap it in an actor, then ray.get() it inside your parallelized function. The caveat is that this incurs deserialization overhead (which can be large if your object is a large, non-array-like object).
Use Actor Pool, which is built for this exact case. You could even wrap it and call it from your pool map function if you wanted.

Yoav · February 15, 2021, 11:32pm

I am indeed using the ActorPool now, which works well, although I need to specify in advance how many actors I will have. I was under the impression that the ray multiprocessing.Pool grows/shrinks automatically with the number of tasks and the current cluster size (ie, if I supply many tasks, it will create more actors and autoscale up). Or is it just my wishful thinking?

Alex · February 16, 2021, 1:10am

oh i see, i think the multiprocessing pool defaults to creating one actor per cpu in the cluster, i’m not aware of any fancy tricks there.

You’re right that ActorPool doesn’t have a way of adding actors to an existing pool right now, but it should be pretty easy to add (as long as someone is willing to implement it). Do you mind filing a github feature request?

Andrea_Pisoni · November 7, 2022, 3:30pm

@Alex is using the ActorPool still the recommended approach for situations where you want a pool of processes and you have expensive initialisation? I see it’s deprecated now.

ClarenceNg · November 15, 2022, 9:14am

As noted in Deprecation of ray.utils.ActorPool - #9 by ClarenceNg the actor pool is no longer deprecated

ClarenceNg · November 15, 2022, 9:27am

@Yoav

regarding your questions of expensive initialization, if it is about process warming / code loading, Ray should already handle that given we do some caching / re-use of workers

Otherwise have you considered using the object store / ray.put & get at the beginning of the task?

Topic		Replies	Views
Ray Worker Initialiser Ray Clusters	0	376	June 13, 2023
[Core] How to make sure an actor is initialized? Ray Core	5	1263	February 17, 2023
Deprecation of ray.utils.ActorPool Ray Core	8	986	November 15, 2022
Run writing file jobs with the prespecified number of workers(?) Ray Core	6	450	December 22, 2021
Low CPU utilization when compared to multiprocessing Ray Core	14	1594	June 1, 2023

Worker initializer in ray.util.multiprocessing.Pool

Related topics