How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am using Ray DAG API to build a simple DAG of tasks based on dependencies specified at runtime. The DAG is very simple, only 5 tasks taking about 50 ms each. This repo shows how the DAG is built, essentially recursively walking up the dependency tree for a requested task and creating IR nodes at each step. The program behaves correctly and execute the tasks as expected, even scheduling tasks that can be executed in parallel (B,C and D,E) in different workers. The total execution time is much higher than expected, the critical path takes ~150 ms, however, the output below shows the total time to execute the graph is ~1s. Am I miss-using the API or is it a known issue?
Test environment: Perlmutter CPU with ray 2.0.0, python 3.7
Running with 2 nodes with 128 threads each (256 cores) Dependency: AlgorithmA -> none Dependency: AlgorithmB -> AlgorithmA Dependency: AlgorithmC -> AlgorithmA Dependency: AlgorithmD -> AlgorithmB AlgorithmC Dependency: AlgorithmE -> AlgorithmB Processing 1 events (execute pid=141681) Total time for AlgorithmC: 48.99ms (execute pid=141707) Total time for AlgorithmA: 50.68ms (execute pid=141707) Total time for AlgorithmB: 48.76ms (execute pid=141707) Total time for AlgorithmE: 48.97ms (execute pid=141681) Total time for AlgorithmD: 49.08ms Total time to schedule: 417.45ms Total time to process: 947.14ms Time between scheduling end and processing end: 529.69ms