RayCluster does not limit the total job info stored in redis

wangxin201492 · February 10, 2025, 9:23am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I find that when I submit many job to RayCluster(run with redis), the job table & workers table will not clear the oldest items, which 1) cost much more memory for redis; 2) make dashboard GET /api/jobs interface run slowly.

Is any plan to fix it ?

christina · February 11, 2025, 8:35pm

Hi there wangxin,

There doesn’t seem to be a direct fix mentioned in the docs, but there are a few things you can try to keep things running smoother. First, check that your cluster is handling memory efficiently - Ray has some built-in tools for managing memory and avoiding out-of-memory issues. The dashboard can also give you a good sense of what’s eating up resources, so keeping an eye on that might help.

If old job and worker entries are piling up, you could look into ways to periodically clear them out or tweak caching settings to keep memory usage in check. I’ll attach some docs that might be helpful down below. Do you have any sort of garbage collection right now in your jobs?

I’ll also go take a look at the Github issues and see if anyone’s mentioned it.

Docs:

wangxin201492 · February 12, 2025, 2:29am

Thanks for you reply.

I also do not find any workround for this issue, and I think the memory issue is not serious, but GET /api/jobs interface becomes slowly will bother me. I will clear the job info in redis periodically to avoid this issue.

I think it will be prefect if job&worker info can be cleared periodically by gcs.

Topic		Replies	Views
Memory leak in ray head Ray Clusters	4	1053	December 16, 2021
Job API is very slow when using external redis	3	326	September 26, 2023
Ray head node stops responding Ray Clusters	4	146	October 23, 2024
ray::IDLE_SpillWorker memory consumption and OOM Ray Clusters	4	232	September 10, 2024
How to recover job data when using ray service to restart the ray cluster Kubernetes	1	554	June 5, 2023

RayCluster does not limit the total job info stored in redis

Related topics