Ray Core DAG API is slow

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.


I am using Ray DAG API to build a simple DAG of tasks based on dependencies specified at runtime. The DAG is very simple, only 5 tasks taking about 50 ms each. This repo shows how the DAG is built, essentially recursively walking up the dependency tree for a requested task and creating IR nodes at each step. The program behaves correctly and execute the tasks as expected, even scheduling tasks that can be executed in parallel (B,C and D,E) in different workers. The total execution time is much higher than expected, the critical path takes ~150 ms, however, the output below shows the total time to execute the graph is ~1s. Am I miss-using the API or is it a known issue?

Test environment: Perlmutter CPU with ray 2.0.0, python 3.7

Running with 2 nodes with 128 threads each (256 cores)
Dependency: AlgorithmA -> none
Dependency: AlgorithmB ->  AlgorithmA
Dependency: AlgorithmC ->  AlgorithmA
Dependency: AlgorithmD ->  AlgorithmB AlgorithmC
Dependency: AlgorithmE ->  AlgorithmB
Processing 1 events
(execute pid=141681) Total time for AlgorithmC: 48.99ms
(execute pid=141707) Total time for AlgorithmA: 50.68ms
(execute pid=141707) Total time for AlgorithmB: 48.76ms
(execute pid=141707) Total time for AlgorithmE: 48.97ms
(execute pid=141681) Total time for AlgorithmD: 49.08ms
Total time to schedule: 417.45ms
Total time to process: 947.14ms
Time between scheduling end and processing end: 529.69ms

There could be two overheads: 1 scheduling overhead, 2 worker initialization overhead. Wondering in your case you actually hit by worker initialization overhead.

One way to quickly verify the theory is to rerun the same code again, and if you see performance improvment.