Keep long running workers to handle slow initialization

Hello, whenever we start a ray worker, we need to run some initialization code which takes around 5 seconds.

We were wondering if there’s a way to always have a few IDLE ray workers which are already initialized so that we don’t have to pay the initialization cost. Currently, there doesn’t seem to be a way to control how long the IDLE workers are around for. We also couldn’t find any other workout to avoid this initialization cost slowing our whole computation.

Any advice will be greatly appreciated.

Hello, maybe you can try the minWorkers setting in values.yaml file to set to the number of workers you always need in the cluster. setting large idleTimeoutMinutes param can also help to keep idle workers in an autoscaled cluster.

Thank you for your response. That could technically work, only thing missing would be a way to run an initialization function on workers at startup. run_function_on_all_workers doesn’t seem to work

May be try using actors and have init method inside the actor class

Thanks again for your response. We considered that, the only problem with using actors is that we are planning to use modin over ray (Scale your pandas workflow by changing a single line of code — Modin 0.10.1+14.gb58663b.dirty documentation). Using actors might mean that we cannot plugin modin directly and might need to fork it and have it use our custom backend. Please let us know if that understanding is incorrect.

I believe it is currently not supported. But there’s a workaround (that is not recommended since it doesn’t use the public APIs, but you can use as a workaround). You can call certain functions on all workers when they start by calling ray.worker.global_worker.run_function_on_all_workers(function) Note that the API stability is not guaranteed for this approach.