Prefetch data to GPU in `map_batches`

dsuess · August 22, 2024, 8:43am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

When doing inference with a pytorch model in ray.data, I often use the following pattern:

class InferenceActor:
    ...
  
    def __call__(self, data):
        x = torch.from_numpy(data["x"]).to(self.device)
        y = model(x)
        return {"y": y.cpu().numpy()}

pipe = (
    ...
    .map(data_load_fn)
    .map_batches(InferenceActor, num_gpus=1)
)

However, I can’t get good GPU utilisation out of this – especially for small models due to the host-device transfer at the start of the actor’s __call__ method. Is there some way to overlap inference with those memory transfers? I found a workaround by simply allocating multiple copies of the same actor on a single GPU, but that only works if the model itself isn’t too big.

sjl · August 22, 2024, 8:17pm

I think what you want for this case is the prefetch_batches arg with our iter_batches() APIs. You can read more details here: ray.data.Dataset.iter_torch_batches — Ray 2.34.0

dsuess · August 23, 2024, 1:40am

Thanks for your reply! Yeah, that’s where the question comes from – I’ve used iter_torch_batches in conjunction with ray.train. Do you have an example where it’s used similar to my example above. In case of a single GPU-inference it’s pretty easy, but if we set concurrency > 1 on map_batches in the example above, I don’t see how I could replicate that with iter_torch_batches without re-implementing a lot of functionality.

sjl · August 26, 2024, 6:13pm

In this case, you can use the max_concurrency ray remote arg into map_batches(). For example, map_batches(..., max_concurrency=2) will prefetch 1 extra batch (1 actor will have the current batch, 1 actor will have the next batch).

Topic		Replies	Views
Single node, 4x GPU, map_batches only using 1 Ray Data	3	536	October 5, 2023
Ray inferencing not happening in streaming way	7	356	December 13, 2023
[Core] Question on optimizing machine learning project speed using ray Ray Core	5	456	February 1, 2022
How to do batch inference on a dataset? Ray Libraries (Data, Train, Tune, Serve)	0	159	March 25, 2024
Ray multiprocessing with multi pytorch model inference Ray Core	1	439	October 18, 2023

Prefetch data to GPU in `map_batches`

Related Topics