Question about confusing object spilling mechanism

xial-thu · August 2, 2025, 10:00am

1. Severity of the issue: (select one)
Low: Annoying but doesn’t hinder my work.

2. Environment:

Ray version: 2.47.1
Python version: 3.12.7+gc
OS: Ubuntu 24.04.1 LTS
Cloud/Infrastructure: kubernetes
Other libs/tools (if relevant): none

3. What happened vs. what you expected:

Expected: object spilling happens after object memory exceeds limit
Actual:

I have a cluster of 128 nodes, each node with 1.8T memory. I set RAY_OBJECT_STORE_ALLOW_SLOW_STORAGE=1，the set object store memory limit to 1.5T

the start command is ray start --num-cpus=128 --num-gpus=8 --head --temp-dir=/tmp/ray --port=6379 --system-config=‘{“object_spilling_config”: “{"type": "filesystem", "params": {"directory_path": "/nvme/tmp/ray"}}”}’ --object-store-memory=1649267441664

the total object\_store\_memory capacity is expected, shown as below by ray status, due to 1.5 * 128 = 192TB

Total Usage:
1025.0/16384.0 CPU (1024.0 used of 1024.0 reserved in placement groups)
1024.0/1024.0 GPU (1024.0 used of 1024.0 reserved in placement groups)
0B/37.95TiB memory
79.01GiB/192.00TiB object_store_memory

when running workload, I found worker-0, where I start ray head node, triggers object spilling when memory usage is only about ~300GB

# free -h
total        used        free      shared  buff/cache   availableMem:           
1.8Ti       338Gi       1.0Ti       238Gi       688Gi       1.5Ti
Swap:             0B          0B          0B

and

# du -sh /nvme/tmp/ray/
812G    /nvme/tmp/ray/

I set a large object store memory, to try to avoid object spilling, because the total amount of data collected by head node is about 1.1TB. But the reality is I cannot avoid it.

So is it expected? Thx

christina · August 4, 2025, 7:30pm

Hi xial,

Welcome to the Ray community! Just wondering, what do you see if you run ray memory to debug the memory issues? You can read more about it here: Memory Management — Ray 2.48.0

Topic		Replies	Views
Ray Spilled the object even though lots of /dev/shm is empty Ray Core	6	396	September 29, 2023
Object store memory allocation on cluster Ray Core	3	1571	February 5, 2021
How does Ray handle when a single object exceeds the Plasma capacity? Ray Core	2	468	November 29, 2021
Object Spilling Ray Tune	1	797	January 10, 2023
How can I configure ray to never run out of memory? Ray Core	4	2013	March 1, 2021

Question about confusing object spilling mechanism

Related topics