Job API is very slow when using external redis

mingshi · September 21, 2023, 1:19am

Hi Ray experts,

We are running Ray 2.7.0 using external redis cache for the GCS. We can submit jobs and view historical jobs on the job page of the dashboard UI. However, when we faked the head node failure by killing the head pod “kubectl delete pod {head_pod_name}”, the job page hangs after the head node is started. It takes a very long time for the page the load the job list. We also found that the issue is gone if the redis data are cleaned via the “FLUSHALL” command.

We need help debugging this issue further.

Thanks!
Mingshi

Huaiwei_Sun · September 24, 2023, 5:08am

This seems a bug? cc: @Kai-Hsun_Chen @sangcho

mingshi · September 25, 2023, 9:26pm

This issue is seen by other users (see this thread Slack), so this does seem like a bug. It would be great if someone can offer help here.

Thanks again!

Kai-Hsun_Chen · September 26, 2023, 5:51pm

I will reply in the Slack thread.

Topic		Replies	Views
Ray dashboard is hanging Dashboard, Monitoring & Debugging	10	1192	June 1, 2023
How to recover job data when using ray service to restart the ray cluster Kubernetes	1	547	June 5, 2023
RayCluster does not limit the total job info stored in redis Ray Clusters	2	18	February 12, 2025
Dead head nodes selected in scheduling Ray Core	1	18	February 16, 2025
Ray head node stops responding Ray Clusters	4	124	October 23, 2024

Job API is very slow when using external redis

Related topics