Ray Serve get Header / Dynamic Batching with FastAPI

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello everyone,
thanks for building the awesome framework. It help me a lot recently when it come to deploy large open-source language model.
Now, I’m building a text embedding server with ray serve. Specifically I want to use the Dynamic Batching features of Ray Serve. Now my code is working good but I have two question:

  1. Does Ray Serve Dynamic Batching work with FastAPI ingress, I haven’t seen any example on how to do that. Now I use ray serve 100 % without any FastAPI intergration in order to get this feature working.

  2. When using Dynamich Batching: how could I read the Header, not only the request?
    I would like to add the authentication bearer to my serve.
    Currently my code without Header is like that:

@serve.batch(max_batch_size=8,batch_wait_timeout_s=0.5)
async def handle_batch(self, request_list):
         for id, request in enumerate(request_list):
                 try:
                      content = await request.json()
         ....

async def __call__(self, request: Request) -> List[str]:
        return await self.handle_batch(request)

How can I add Header into that:
I tried:

@serve.batch(max_batch_size=8,batch_wait_timeout_s=0.5)
async def handle_batch(self, request_list, header_list):
         for id, request in enumerate(request_list):
                 try:
                      content = await request.json()
         ....

async def __call__(self, request: Request, header: Header) -> List[str]:
        return await self.handle_batch(request, header)

but it keep saying:

traceback (most recent call last):
(ServeReplica:default_Embedding_Server pid=11547)   File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/replica.py", line 633, in invoke_single
(ServeReplica:default_Embedding_Server pid=11547)     result = await method_to_call(*request_args, **request_kwargs)
(ServeReplica:default_Embedding_Server pid=11547) TypeError: Embedding_Server.__call__() missing 1 required positional argument: 'header'

Thanks in advance. It could help me a lot if I can solve this.

Hi @tattrongvu, welcome to the forums! Glad to hear that Ray Serve has been helpful.

Yes, it does! See this section in the docs for an example that uses batching and FastAPI together.

You need to access the header through the request object. The request objects are Starlette requests. These Starlette docs show how to access the headers:

For example: request.headers['content-type']
1 Like

thank you very much @shrekris, it help me a lot. :raised_hands: