Ray Core DAG API is slow

esseivaju · October 13, 2022, 1:14am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi,

I am using Ray DAG API to build a simple DAG of tasks based on dependencies specified at runtime. The DAG is very simple, only 5 tasks taking about 50 ms each. This repo shows how the DAG is built, essentially recursively walking up the dependency tree for a requested task and creating IR nodes at each step. The program behaves correctly and execute the tasks as expected, even scheduling tasks that can be executed in parallel (B,C and D,E) in different workers. The total execution time is much higher than expected, the critical path takes ~150 ms, however, the output below shows the total time to execute the graph is ~1s. Am I miss-using the API or is it a known issue?

Test environment: Perlmutter CPU with ray 2.0.0, python 3.7

Running with 2 nodes with 128 threads each (256 cores)
Dependency: AlgorithmA -> none
Dependency: AlgorithmB ->  AlgorithmA
Dependency: AlgorithmC ->  AlgorithmA
Dependency: AlgorithmD ->  AlgorithmB AlgorithmC
Dependency: AlgorithmE ->  AlgorithmB
Processing 1 events
(execute pid=141681) Total time for AlgorithmC: 48.99ms
(execute pid=141707) Total time for AlgorithmA: 50.68ms
(execute pid=141707) Total time for AlgorithmB: 48.76ms
(execute pid=141707) Total time for AlgorithmE: 48.97ms
(execute pid=141681) Total time for AlgorithmD: 49.08ms
Total time to schedule: 417.45ms
Total time to process: 947.14ms
Time between scheduling end and processing end: 529.69ms

Chen_Shen · October 19, 2022, 5:40pm

@esseivaju
There could be two overheads: 1 scheduling overhead, 2 worker initialization overhead. Wondering in your case you actually hit by worker initialization overhead.

One way to quickly verify the theory is to rerun the same code again, and if you see performance improvment.

Topic		Replies	Views
Processing performance of tasks Ray Core	14	805	March 8, 2021
CPU cores, CPU threads, and scaling of Ray tasks Ray Core	1	231	June 25, 2024
High wait_for_function overhead for new tasks Ray Core	6	375	November 2, 2022
Little speed up from 100 to 300 cores Ray Core	4	386	July 5, 2022
Parallelization of Graph algorithm on Ray Cluster + SLURM Ray Core	7	717	December 7, 2022

Ray Core DAG API is slow

Related topics