1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.
2. Environment:
- Ray version: 2.47.1
- Python version: 3.11
- OS: Ubuntu 22.04
- Cloud/Infrastructure: 5070Ti + 64G + AMD64
- Other libs/tools (if relevant): None
3. What happened vs. what you expected:
- Expected: I’m new to Ray and would like to know how to freeze a Ray local cluster before killing all actors and tasks due to out-of-memory, and how to drill down to a specific actor for memory analysis and get detailed object information in Plasma/Heap.
- Actual: The job exits directly, leaving only the Actor-level memory overflow log report
First of all, I would like to express my gratitude to friends who are willing to share their experience
Traceback (most recent call last):
File "/home/inbreeze/PycharmProjects/DataPlatformOnRay/python/raypipe/runner.py", line 65, in <module>
logger.info(f"[DataPlatformOnRay] final return: {ray.get(mergeActor.run.remote())}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ray-dev/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ray-dev/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ray-dev/lib/python3.11/site-packages/ray/_private/worker.py", line 2849, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ray-dev/lib/python3.11/site-packages/ray/_private/worker.py", line 939, in get_objects
raise value
ray.exceptions.OutOfMemoryError: Task was killed due to the node running low on memory.
Memory on the node (IP: 10.2.97.17, ID: 89f4f4b4c16742cd9053e68d1513923883441cf068bfc2667e81c669) where the task (actor ID: 0f28f36dc1a20a125cb8a86f05000000, name=ResultMergeActor.__init__, pid=268816, memory used=0.06GB) was running was 61.76GB / 62.62GB (0.98628), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: 5d4a0cc4958f36a555306298c10f9043ec52bfd14067810dbc5f4a9d) because it was the most recently scheduled task; to see more information about memory usage on this node, use `ray logs raylet.out -ip 10.2.97.17`. To see the logs of the worker, use `ray logs worker-5d4a0cc4958f36a555306298c10f9043ec52bfd14067810dbc5f4a9d*out -ip 10.2.97.17. Top 10 memory users:
PID MEM(GB) COMMAND
263393 14.02 ray::SceneCutRouter.submit
268660 11.42 ray::AesActor.process_with_queue
268785 11.22 ray::OptFlowActor.process_with_queue
263388 6.15 ray::SceneCutMemActor
268686 4.95 ray::OCRActor
54419 3.07 /home/inbreeze/.cache/JetBrains/RemoteDev/dist/461d91da9a280_pycharm-2025.2.0.1/bin/remote-dev-serve...
263297 0.47 /opt/miniconda3/envs/ray-dev/bin/python /home/inbreeze/PycharmProjects/DataPlatformOnRay/python/rayp...
268778 0.41 ray::LaplacianActor.process_with_queue
263394 0.29 ray::SceneCutMemActor
70074 0.28 /home/inbreeze/.cache/JetBrains/PyCharm2025.2/full-line/models/7b957b12-0866-32e2-985d-1542c7c2aeee/...