Try to understand the overhead of small tasks

Small tasks: Are your tasks very small? Ray introduces some overhead for each task (the amount of overhead depends on the arguments that are passed in). You will be unlikely to see speedups if your tasks take less than ten milliseconds. For many workloads, you can easily increase the sizes of your tasks by batching them together.

We are testing our data pipeline built on Ray. It works well with large dataset where each actor has a long lifetime. However, when testing small datasets, the speed is extremely slow, and most of the time is spent on actors spinning up and shutting down. The data pipeline with small datasets repeatedly launch and shut down 200 actors every a few seconds as the duration of task is tiny.

  • I am trying to understand what is the overhead in terms of managing actors ?
  • Is the overhead different for large task and small task ?
  • Is there any lock or centralized components affect the scalability when comes to a huge number of small tasks ?
  • Is it possible the slowdown is caused by Ray launch a bunch of actors but these actors do nothing useful but just spinning ?

actors spinning up and shutting down.

Can’t you just share actors in this case?

  • I am trying to understand what is the overhead in terms of managing actors ?

All actors creations are initialized by a centralized component called gcs_server. Actor.remote has an overhead of single IPC (this is not completely async) from worker ↔ gcs_server (should be pretty low, but higher than task.remote).So, simply put,
worker → (queue creation tasks)gcs_server → (replied)worker. And after that gcs_server will manage all lifecycle.And if there are so many actor creation, it can impact gcs server load. But once actors are created, gcs server has minimal involvement (it is used only for fault tolerance).

  • Is the overhead different for large task and small task ?

There’s “scheduling overhead”, not small / large task overhead. Afaik, our scheduling latency is very low, but if you have really really simple tasks, you might not get benefit of parallelization. Imagine you are doing 1+1 in your task!

  • Is there any lock or centralized components affect the scalability when comes to a huge number of small tasks ?

No. There’s no lock or centralized component. All task (including actor task) scheduling is done by decentralized protocol. (but actor creation can be bounded by gcs_server, but actor creation shouldn’t happen that frequently anyway).

  • Is it possible the slowdown is caused by Ray launch a bunch of actors but these actors do nothing useful but just spinning ?

Each empty actor should use 1% cpu + 50~100mb memory. So it is unlikely.