Small tasks: Are your tasks very small? Ray introduces some overhead for each task (the amount of overhead depends on the arguments that are passed in). You will be unlikely to see speedups if your tasks take less than ten milliseconds. For many workloads, you can easily increase the sizes of your tasks by batching them together.
We are testing our data pipeline built on Ray. It works well with large dataset where each actor has a long lifetime. However, when testing small datasets, the speed is extremely slow, and most of the time is spent on actors spinning up and shutting down. The data pipeline with small datasets repeatedly launch and shut down 200 actors every a few seconds as the duration of task is tiny.
- I am trying to understand what is the overhead in terms of managing actors ?
- Is the overhead different for large task and small task ?
- Is there any lock or centralized components affect the scalability when comes to a huge number of small tasks ?
- Is it possible the slowdown is caused by Ray launch a bunch of actors but these actors do nothing useful but just spinning ?