Question - Inference batching from multiple workers

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Apologies for the re post - I messed up on my end and… anyway

Hi there,

I’ve gone through the Docs and i’m still unsure. could someone point me in the right direction or just state whether its not possible

Consider the following situation:

A Policy actor serves inference requests from several (5) self-play workers

currently each worker individually requests references via

            
ref = self.ps.inference.remote(s)
fut: asyncio.Future = asyncio.wrap_future(ref.future())
p, v = await fut
p, v = p.copy(), v.copy()

Since each worker only requires a single inference request each, this leads to inefficiency.

is there a way to batch the references sent to the policy actor, have it complete all the inference as one, and then ensure the individual results get back to the correct actor

I’m aware this would lead to slight bottle-necking since we’d be waiting for the batch to fill. but i think this would be minimized by setting the batch to be some fraction of the overall number of requests.

Anyway, thanks!