To track real object store usage on a GPU worker node, use the Ray Dashboard’s “Object Store” tab or run ray memory on the node to see which objects are occupying memory and their sizes (source). For programmatic access, you can parse the output of ray memory or use Prometheus metrics like ray_object_store_memory if you have metrics enabled (source).
Would you like more detail on using these tools or example code for programmatic monitoring?
Sources:
- https://github.com/ray-project/ray/blob/master/doc/source/ray-observability/user-guides/debug-apps/debug-memory.rst
- https://discuss.ray.io/t/how-to-programatically-do-real-time-monitoring-of-actor-task-resource-usage-heap-memory-obj-store-memory-cpu/8454
Hint: Mention @RunLLM in the post for followups.