Ray normal DAG vs Compiled DAG

In a dynamic execution graph (normal Ray DAG), information about task dependencies, resource requirements, and data flow is only known at runtime—when each input block is processed, Ray schedules tasks/actors on the fly, and dependencies are resolved as tasks are submitted. The system cannot optimize resource allocation or communication ahead of time, so each task incurs scheduling and communication overhead. Lineage and dependency info are kept in memory for fault tolerance, but not used for global optimization or pre-allocation. See Ray Data key concepts and Ray DAG discussion.

With static compilation (Ray Compiled Graph), the entire DAG—including all task dependencies, resource needs, and communication patterns—is known before execution. Ray uses this information to pre-allocate resources, set up communication channels (e.g., shared memory, NCCL for GPU), and optimize scheduling to minimize overhead and avoid deadlocks. This enables optimizations like overlapping compute and communication, and ensures sub-millisecond orchestration. The static knowledge allows Ray to avoid per-task scheduling overhead and to optimize for throughput and latency. See Compiled Graph docs and Compiled Graph quickstart.

Would you like a step-by-step example of how this information is used in practice?

Sources:

Hint: Mention @RunLLM in the post for followups.