Disk Pressure due to accumulating logs and runtime_resources

koziello · July 28, 2023, 3:45pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello,
we’re having a Ray cluster setup in Kubernetes and we noticed disks getting full over time with logs and runtime resources. As a result our head node would flip with disk_pressure error once it reached 91% of disk utilization. I’ve created a cron task to clear files from /tmp/ray location for previous sessions, preserving only what’s in session_latest.
What are the consequences of deleting those files? How can we be sure we won’t disrupt an user by deleting the runtime_env that he’s using? What is the preffered way to handle this situation?
Thanks!

Please refer to the screenshot

Topic		Replies	Views
Ray is creating hundreds of logs files under /tmp/ray/session_latest/logs/ causing disk space issue and I/O Spikes Ray Serve	7	979	December 17, 2024
[Ray K8s cluster] - Script exit	0	304	July 8, 2023
Ray behavior with deleted log files Ray Core	1	111	June 11, 2024
Ray head node regularly using up all host disk space Ray Clusters	0	474	June 22, 2021
Ray log location Dashboard, Monitoring & Debugging	25	2746	May 5, 2023

Disk Pressure due to accumulating logs and runtime_resources

Related topics