Slow Actor start time due to import overhead of dependencies


I’m working on a realtime video analytics application that depends on the scientific computing stack of numpy/scipy/opencv/torch/sklearn/matplotlib/etc. It spawns multiple actors for each stream and we’re running into issues with the startup time, which appears to be due to the overhead introduced due to the import time of these libraries. Because we have an actor for each session that spins up other actors for ingest and processing the cost blows up and it takes around 15 seconds for the initial ray.get(actor.method.remote()) to return.

Is there a way to get around this by potentially preallocating a process pool for the actors with the initial set of dependencies loaded?

Have you looked into Actor pools

Yes but it’s not the easiest solution in my case because the actors I’m using are stateful and get assigned to individual video streams.

I’m really looking for a way to make Actors get preallocated the way tasks are (as explained here Using Actors — Ray v2.0.0.dev0).

If not I’ll probably have to preallocate the actors myself at startup and have some extra logic to manage assigning free ones to new streams.

Actually, tasks are not preallocated in this case. What’s happening is there are pre-created workers, and when actors are created, it chooses one of them to be initialized.

I think there’s no clear way to make this pre-import works right now. I recommend you to do what you mentioned, but feel free to create API requests to our Github issue page!.