Question - Inference batching from multiple workers

KAV101 · December 10, 2024, 11:33am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Apologies for the re post - I messed up on my end and… anyway

Hi there,

I’ve gone through the Docs and i’m still unsure. could someone point me in the right direction or just state whether its not possible

Consider the following situation:

A Policy actor serves inference requests from several (5) self-play workers

currently each worker individually requests references via

            
ref = self.ps.inference.remote(s)
fut: asyncio.Future = asyncio.wrap_future(ref.future())
p, v = await fut
p, v = p.copy(), v.copy()

Since each worker only requires a single inference request each, this leads to inefficiency.

is there a way to batch the references sent to the policy actor, have it complete all the inference as one, and then ensure the individual results get back to the correct actor

I’m aware this would lead to slight bottle-necking since we’d be waiting for the batch to fill. but i think this would be minimized by setting the batch to be some fraction of the overall number of requests.

Anyway, thanks!

Topic		Replies	Views
Ray multiprocessing with multi pytorch model inference Ray Core	1	539	October 18, 2023
Implementing something similar to SEED RL architecture RLlib	5	533	September 25, 2021
Prefetch data to GPU in `map_batches` Ray Data	3	197	August 26, 2024
Workflow calling Deployment.remote()? Ray Workflows	0	360	November 28, 2023
Help designing fire and forget server for large batch inference Ray Serve	7	764	November 30, 2023

Question - Inference batching from multiple workers

Related topics