Prevent restart of actors in DatasetPipeline

dsuess · July 24, 2023, 7:21am

I am trying to run data-loading and batch-prediction of a semantic segmentation model using a ray.data.dataset. My pipeline looks something like this:

BLOCK_SIZE = 128
metadata = ray.data.from_pandas(df)

pipe = (
    metadata
    .repartition(metadata.count() // BLOCK_SIZE)
    .window(blocks_per_window=64)  # determined by num CPUs
    .map(load_data_fn, num_cpus=1)
    .map_batches(
        Predictor,
        fn_constructor_args=(cfg, checkpoint_path),
        batch_size=8,
        num_gpus=1,
        compute=ray.data.ActorPoolStrategy(size=8, max_tasks_in_flight_per_actor=4)
    )
)

Here, load_data_fn is a function that takes a row from the dataframe and returns the corresponding image (as numpy array inside a dict). The predictor looks sth like this:

class Predictor:
    def __init__(self, cfg, checkpoint_path):
        print("Model is being setup")
        self.model: torch.nn.Module = load_model(...)

    def __call__(self, data):
        ...

The reason for the .repartition(...).window(...) calls in the pipeline setup is that without these, I would easily run out of object store memory since each image is multiple MBs in size.

Currently, I am observing that after each block, the Predictor is reloaded. Naturally, reloading the model every time means that GPU utilisation isn’t great. Is there any way how I can re-use the actor-objects? Or is there something fundamentally wrong with the way I am setting up the pipeline?

Topic		Replies	Views
Fault tolerance with Actors and map_batches Ray Libraries (Data, Train, Tune, Serve)	0	161	October 20, 2023
Loading dataset once per machine in ray cluster	1	157	December 5, 2023
Using initialized ray actors with serve deployment Ray Serve	8	515	March 1, 2022
Asynchronous dataset pipeline map Ray Data	1	426	April 23, 2022
Ray dataset pipeline scheduling missing opportunities Ray Libraries (Data, Train, Tune, Serve)	3	229	August 17, 2023

Prevent restart of actors in DatasetPipeline

Related Topics