Ray Actor failover

sangcho · August 20, 2021, 6:19pm

Hi @sangcho, I have a follow up question. In my previous use case, if the actor holds references of data I put into Ray’s distributed object store (using ray.put), in case of failover, is there any way to recover those references (I guess data is still in plasma store)?

By our reference counting protocol, if the owner (the first worker that creates the object) is dead, the data in the plasma store is destroyed. Unfortunately, in this case, you need to implement check point to recover them and put again. For task-based workload, we support fault tolerance for objects now (Fault Tolerance — Ray 3.0.0.dev0).

By the way, from this thread: Get actor handle by ObjectRef it seems ray.objects api returns all shared objects, but I can’t find this api in latest ray. Is it still supported? If not, is there any other way to achieve what ray.objects does?

We put more efforts recently to stabilize APIs, and this API is deprecated. What’s your use case with this API? If you just want to see what objects are in the cluster, there’s a CLI called ray memory that displays all object information in the cluster.

Topic		Replies	Views
Node fault tolerance in Ray Data Ray Data	2	37	January 10, 2025
Best practice for custom actor recovery Ray Core	1	335	May 23, 2022
Ray worker behaviour Ray Core	8	574	April 10, 2023
Fault tolerance with Ray actors Ray Core	1	323	June 23, 2021
When are named actors shared and to whom? Ray Core	5	639	April 17, 2023

Ray Actor failover

Related topics