Task Dependency or Compiled Graphs

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am working on a project that has a lot of job dependencies. Below is a quick example.

  1. Assume we have 3 steps: A, B, and C, which are responsible to different types of work.
  2. In A, there are 3 sub steps A_1, A_2 and A_3, and each A_i contains 1000 parallel tasks. A_2 should not start running until all tasks in A_1 complete, and A_3 should not start running until all tasks in A_2 complete
  3. In B, there are 4 sub steps B_1, B_2, B_3 and B_4, and each B_i contains 2000 parallel tasks. Same as A, B_i should not start running until all tasks in B_i-1 complete when i >= 2.
  4. In C there are 2 sub steps, C_1 and C_2. C_1 should not start running until all tasks in A AND B complete, and C_2 should not start running until C_1 completes.

Graph should be a good fit for this, but Ray’s task dependency should also be able to cover this. After reading all ray documents, it seems to me Ray’s task and actor are used very often and maintained very well given there are so many contents, while Ray’s compiled graph has only several pages and doesn’t seem to be well maintained. Should I use Ray’s task/actor dependency set up, or compiled graph for my project. This is pretty important to me given it is a long term project.

Thanks

Hi @jianglin091 , Ray Compiled Graph is in alpha right now and we are close to a beta release in a few weeks. Compiled Graph is suitable for static, high performance, or accelerator-centric workloads. Note that it only supports actors, so you’d need to allocate all these actors before hand and not reclaiming them.

It sounds like your workload is very dynamic and involves a lot of tasks. Ray Core API should be more suitable.