Actor launch overhead question

marsupialtail · September 30, 2022, 6:25pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have some experiments on actor launch overhead here: [Ray Core] large actor launch overhead. · Issue #28777 · ray-project/ray · GitHub

In my application, it is necessary to pass to the actor an object of a custom Python class, let’s say class A. This class is defined in my python package, pyquokka.

Let’s say we have an actor that looks like this:

@ray.remote
class B:
     def __init___(self, obj):
         self.obj = obj

I discovered that launching this actor like this:

from pyquokka import A
actor = B.remote(A)

Is 2x slower than copying the definition of A into the same Python file or just another file in the same directory as you are running you ray script like this:

class A: ......

@ray.remote
class B: ....
actor = B.remote(A)

I have no idea why this happens and the measurements suggest this scales to multiple ray actors. Can someone give some pointers.

rickyyx · October 3, 2022, 4:42pm

Interesting - thanks for flagging this. Let me follow up with folks who know more about the actor init path.

In the meanwhile, would you mind sharing the measurements data as well? (e.g. how much slower is the non embedded version)

rickyyx · October 3, 2022, 5:51pm

Another thing that would be good to verify - would you also provide a time measurement on the import statement itself?

marsupialtail · October 3, 2022, 6:11pm

Interesting – the import is quite slow, and accounts for the difference between the two. But that should already be done outside of the actor init right? So does the actor init basically call another import?

rickyyx · October 3, 2022, 6:26pm

Ah thanks for the verification. So ray will have to package the dependencies on demand to initialize the actors in case the actor is run remotely I believe.

Maybe you could try with providing this as a runtime env dependency in ray.init so the cost could be amortized from cluster starting up? Environment Dependencies — Ray 2.0.0

sangcho · October 4, 2022, 7:11am

In Ray, you 1. serialize the class def. 2 deserialize from workers before initializing an actor. When serialization happens, all modules’ “references” are included, and when it is deserialized all references are imported. So, if you have class A in the same file, you have no import cost (since you can find it in the same file), but when you import A, there’s going to be importing A happening before the actor is initialized.

For this case, you can probably try this? [runtime_env] Support a worker setup hook that runs before importing Ray · Issue #19640 · ray-project/ray · GitHub You can set the env var that imports all modules ahead of time

marsupialtail · October 4, 2022, 4:24pm

What exactly happens then? If I have a env var that imports my files, actors that launch won’t import it anymore?

sangcho · October 5, 2022, 1:42am

In this case, import starts as soon as the worker starts. So when the actor is actually deserialized, import will be cheaper (Python can figure out imported module and make it as no-op). Please note that I am not 100% sure it will work, but I think it is worth trying. We are trying to figure out making this experience better soon btw. cc @Chen_Shen

Topic		Replies	Views
Slow Actor start time due to import overhead of dependencies Ray Core	3	336	May 29, 2021
How to configure a Ray cluster to have actor/task source code and avoid pickling overhead? Ray Core	4	522	February 24, 2023
Starting and killing many actors Ray Core	3	461	March 12, 2021
Delay ray.get() seems cannot speed up for actors Ray Core	2	442	June 9, 2022
[Core] How to make sure an actor is initialized? Ray Core	5	1238	February 17, 2023

Actor launch overhead question

Related topics