“Debugging ray.data”

itswzz8 · November 28, 2025, 2:50pm

1. Severity of the issue: (select one)

Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

Ray version:
Python version:
OS:
Cloud/Infrastructure:
Other libs/tools (if relevant):

3. What happened vs. what you expected:

Expected:
Actual:

“I’m studying parts of Ray’s source code. Since ray.data uses lazy execution, when I debug Ray locally, I can’t get the code to run into Ray’s internal logic. This makes it impossible for me to trace the data flow during execution. How do you usually debug and study Ray in this situation?”

RunLLM · November 28, 2025, 2:51pm

To debug and study Ray’s internal logic, especially with lazy execution in ray.data, you should trigger execution by materializing the dataset (e.g., by iterating over it or calling methods like .take(), .show(), or .materialize()). This forces Ray to execute the computation graph, allowing you to step into internal logic and trace data flow. You can also insert print statements or use the Ray distributed debugger by adding breakpoint() in your functions, then attach with VS Code or use the Ray Debugger for interactive inspection. For more details, see Ray Debugging Guide and Ray Data Exception Handling.

Would you like a step-by-step example of how to set up breakpoints and trigger execution for debugging?

Sources:

Ray Data Exception Handling

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Debugger breakpoint not trigged in basic script	0	452	April 16, 2023
Newbie question : how to debug map_batches Ray Core	1	353	February 17, 2024
Ray.data ModuleNotFoundError while debugging in pytest in PyCharm Ray Data	1	554	August 26, 2022
Tools for debugging Ray applications Dashboard, Monitoring & Debugging	5	1147	December 13, 2020
How do you ray team debug the core internally? Dashboard, Monitoring & Debugging	2	480	March 28, 2023

“Debugging ray.data”

Related topics