Fault tolerance with Actors and map_batches

felipebcs · October 20, 2023, 1:28pm

Hi guys! One question regarding Actors and map_batches:

I saw on Actors — Ray 2.43.0 that we may use max_restarts and max_task_retries in a @ray.remote decorator to allow an actor to be restarted on failures, but how can I pass these parameters when running map_batches on a dataset pipeline?

Here is an example of an actor class and a call to map_batches:

dataset.map_batches(ActorA, compute=ray.data.ActorPoolStrategy(1, 4), batch_size=100000)


class ActorA:
    def __init__(self):
        # init actor here

    def __call__(self, batch):
        return self.potential_exception_method(batch)

   def potential_exception_method(self, batch):
        # logic here

Oblynx · January 9, 2025, 2:01pm

Hope you’ve figured it out by now! You should be able to pass them directly to map:

ray_pipeline.map(MyActorClass, max_restarts= 5, max_task_retries= -1)

Topic		Replies	Views
[Data] How to limit the number of retries from system failures for dataset.map? Ray Data	3	46	November 1, 2024
Prevent restart of actors in DatasetPipeline	0	209	July 24, 2023
Ray Data ray.exceptions.GetTimeoutError: Timed out while starting actors	1	44	January 8, 2025
Best practice for custom actor recovery Ray Core	1	336	May 23, 2022
Dataset support concurrency in one block when using map_batches	4	647	October 1, 2022

Fault tolerance with Actors and map_batches

Related topics