How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I have some experiments on actor launch overhead here: [Ray Core] large actor launch overhead. · Issue #28777 · ray-project/ray · GitHub
In my application, it is necessary to pass to the actor an object of a custom Python class, let’s say class A. This class is defined in my python package, pyquokka.
Let’s say we have an actor that looks like this:
@ray.remote
class B:
def __init___(self, obj):
self.obj = obj
I discovered that launching this actor like this:
from pyquokka import A
actor = B.remote(A)
Is 2x slower than copying the definition of A into the same Python file or just another file in the same directory as you are running you ray script like this:
class A: ......
@ray.remote
class B: ....
actor = B.remote(A)
I have no idea why this happens and the measurements suggest this scales to multiple ray actors. Can someone give some pointers.
Interesting - thanks for flagging this. Let me follow up with folks who know more about the actor init path.
In the meanwhile, would you mind sharing the measurements data as well? (e.g. how much slower is the non embedded version)
Another thing that would be good to verify - would you also provide a time measurement on the import statement itself?
Interesting – the import is quite slow, and accounts for the difference between the two. But that should already be done outside of the actor init right? So does the actor init basically call another import?
Ah thanks for the verification. So ray will have to package the dependencies on demand to initialize the actors in case the actor is run remotely I believe.
Maybe you could try with providing this as a runtime env dependency in ray.init
so the cost could be amortized from cluster starting up? Environment Dependencies — Ray 2.0.0
In Ray, you 1. serialize the class def. 2 deserialize from workers before initializing an actor. When serialization happens, all modules’ “references” are included, and when it is deserialized all references are imported. So, if you have class A in the same file, you have no import cost (since you can find it in the same file), but when you import A, there’s going to be importing A happening before the actor is initialized.
For this case, you can probably try this? [runtime_env] Support a worker setup hook that runs before importing Ray · Issue #19640 · ray-project/ray · GitHub You can set the env var that imports all modules ahead of time
What exactly happens then? If I have a env var that imports my files, actors that launch won’t import it anymore?
In this case, import starts as soon as the worker starts. So when the actor is actually deserialized, import will be cheaper (Python can figure out imported module and make it as no-op). Please note that I am not 100% sure it will work, but I think it is worth trying. We are trying to figure out making this experience better soon btw. cc @Chen_Shen