Profiling and Analyzing Ray's Communications Overhead

Hi,
I’m working on communication profiling and overheads in Ray (tracking taskID, node interactions, and data transfer).
So far, I’ve experimented with:

  • Dumping all traffic from Ray instances (container).
  • Considering using RAY_LOG(DEBUG) in key parts of the source code.
  • Considering modifications with grpc/support/log.h.

Before diving deeper, i’d like some advice:

  1. What’s the best method to analyse and profile communications in Ray efficiently?
  2. Is there and existing communication debugging tool/method for Ray that I might be overlooking?
  3. If I were to develop and contribute a communication analyzing feature to Ray, what would be the best direction to take?

Thanks in advance.