"await" vs "asyncio.gather" when making multiple calls to Deployment

I’m trying to use the model composition design pattern in Ray Serve but I’m struggling to come up with the best solution when dealing with multiple calls to sub-models.

Which of these solutions would you say is more Ray-thonic (or even uses “await” / “asyncio.gather” properly)?

  1. Using asyncio.gather:
@serve.deployment
class Model:
    ...
    def __call__(self, texts: list[str]):
        tasks = [self.predictor_handle.remote(self.tokeniser_handle.remote(t)) for t in texts]
        refs = await asyncio.gather(*tasks)
        results = await asyncio.gather(*refs)
        return results
  1. Awaiting individual asyncio.Tasks/ray.ObjectRefs:
@serve.deployment
class Model:
    ...
    def __call__(self, texts: list[str]):    
        tasks = [self.predictor_handle.remote(self.tokeniser_handle.remote(t)) for t in texts]
        refs = [await t for t in tasks]
        results = [await r for r in refs]
        return results

I’m not very experienced with asyncio stuff so I’m not sure which would be faster at scale or have the least blocking calls. My initial tests haven’t been very conclusive. It would be great if the Ray community could help me out!