Ray task retry behavior and task ID consistency after worker crash

BrianXu0623 · October 28, 2025, 12:20am

Hello community,

Based on my understanding, a Ray worker node keeps many results of completed tasks in memory. If a worker node crashes, all those results are lost, and other worker nodes will re-execute the failed tasks.

In my implementation, I’m also using an external key-value store to persist each task’s result, where the key is the Ray task ID and the value is the task’s result.

My question is: when another worker retries a task that was previously completed on a crashed node, will the retried task have the same task ID as the original one? If the task IDs are consistent, I could leverage the external storage to achieve idempotence.

Thank you in advance for your help!

RunLLM · October 28, 2025, 12:20am

No, retried tasks in Ray do not have the same task ID as the original attempt. Each retry is assigned a unique task ID, so you cannot rely on task ID consistency for idempotence using external storage. This is confirmed by Ray core maintainers and is not explicitly documented, but is discussed in the community and can be inferred from Ray’s design and test code, where each retry is treated as a new task attempt with a new ID (discuss.ray.io, task_manager_test.cc).

If you need idempotence, you should use your own deterministic key (e.g., a hash of task arguments) for external storage, not the Ray task ID. Would you like more detail on how to implement this pattern?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How does Ray get over workers killing/revival? Ray Core	6	1594	June 9, 2023
Explicit caching/memoization between runs Ray Core	3	889	September 29, 2021
How to recover or re-run actor task on a specific worker node after raylet crashed Ray Data	2	928	May 3, 2022
A few questions about task scheduling and retry Ray Core	1	393	December 15, 2021
Newbi Question: Worker Fault Tolerance?	4	594	February 28, 2022

Ray task retry behavior and task ID consistency after worker crash

Related topics