Hi,
I’m working on communication profiling and overheads in Ray (tracking taskID, node interactions, and data transfer).
So far, I’ve experimented with:
- Dumping all traffic from Ray instances (container).
- Considering using RAY_LOG(DEBUG) in key parts of the source code.
- Considering modifications with grpc/support/log.h.
Before diving deeper, i’d like some advice:
- What’s the best method to analyse and profile communications in Ray efficiently?
- Is there and existing communication debugging tool/method for Ray that I might be overlooking?
- If I were to develop and contribute a communication analyzing feature to Ray, what would be the best direction to take?
Thanks in advance.