Visualize message-passing across cluster

in a guide I found with some tips on how to use the compiled DAG workflow, there is a suggestion to avoid repeated messaging on the driver node with a graphic like this

is there any way to visualize the execution of my workload in such a manner? or more generally any good performance debugging tips for large jobs?

my application works pretty well for small and medium size workloads, but it seems to have some poor scaling properties as I grow closer to the cluster capacity — it almost seems like there is deadlocks of some kind

@ruisearch42 Can you help with the question?

Hi @adienes , you can use torch or nsight to profile your Compiled Graph applications: Profiling — Ray 2.44.1

What does your application look like? GPU or CPU workload? Computation heavy or communication/IO heavy? A minimal script will help us better understand it.