/tmp/ray/.. is over 95% full, wrongly calculated in docker?

  • High: It blocks me to complete my task.

I am trying to run my application in docker, but getting this message, shortly after it fails:

2024-08-21T14:20:53.638777992Z (raylet) [2024-08-21 14:20:53,630 E 233 263] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-08-21_14-08-58_813352_53 is over 95% full, available space: 146678124544; capacity: 3936290357248. Object creation will fail if spilling is required.

It fails with:

2024-08-21T14:21:01.597402880Z The object cannot be created because the local object store is full and the local disk's utilization is over capacity (95% by default).Tip: Use `df` on this node to check disk usage and `ray memory` to check object store memory usage.

But my hard drive is not full. I have over 1tb of free space on the host. So I feel like the available storage is calculated wrongly?
Where is this number coming from? The total space is the 4tb of my host machine.

I read here:
Issue with raylet error · Issue #721 · vllm-project/vllm (github.com)
that it helps mounting the tmp/ray folder to the host machine… why? how?

If you follow the error message and run ‘df’ on the node as well as ‘ray memory’ what does it say?

embarassingnly i just did not understand HOW MUCH space is needed to spill. I thought i was fine with 500 gb.

using a new nvme drive with 2tb and using it to spill the data, i saw that ray spilled over 700gb for the 170gb pointcloud.

So my drive actually was full…

mybad…

1 Like

NP - glad you figured it out!