Visualize message-passing across cluster

adienes · April 1, 2025, 6:42pm

in a guide I found with some tips on how to use the compiled DAG workflow, there is a suggestion to avoid repeated messaging on the driver node with a graphic like this

is there any way to visualize the execution of my workload in such a manner? or more generally any good performance debugging tips for large jobs?

my application works pretty well for small and medium size workloads, but it seems to have some poor scaling properties as I grow closer to the cluster capacity — it almost seems like there is deadlocks of some kind

Mengjin_Yan · April 3, 2025, 3:58pm

@ruisearch42 Can you help with the question?

ruisearch42 · April 5, 2025, 6:32pm

Hi @adienes , you can use torch or nsight to profile your Compiled Graph applications: Profiling — Ray 2.44.1

What does your application look like? GPU or CPU workload? Computation heavy or communication/IO heavy? A minimal script will help us better understand it.

Topic		Replies	Views
Will the Ray DAG/Workflows choose the best transport for data between two node? Ray Core	11	117	November 4, 2024
DAG memory usage Ray Serve	1	459	May 27, 2022
Troubleshooting Slow Task Execution in Ray Clusters Dashboard, Monitoring & Debugging	1	69	December 27, 2024
Profiling and Analyzing Ray's Communications Overhead Ray Core	0	22	April 8, 2025
Parallelization of Graph algorithm on Ray Cluster + SLURM Ray Core	7	717	December 7, 2022

Visualize message-passing across cluster

Related topics