Ray Serve get Header / Dynamic Batching with FastAPI

tattrongvu · October 13, 2023, 11:54am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello everyone,
thanks for building the awesome framework. It help me a lot recently when it come to deploy large open-source language model.
Now, I’m building a text embedding server with ray serve. Specifically I want to use the Dynamic Batching features of Ray Serve. Now my code is working good but I have two question:

Does Ray Serve Dynamic Batching work with FastAPI ingress, I haven’t seen any example on how to do that. Now I use ray serve 100 % without any FastAPI intergration in order to get this feature working.
When using Dynamich Batching: how could I read the Header, not only the request?
I would like to add the authentication bearer to my serve.
Currently my code without Header is like that:

@serve.batch(max_batch_size=8,batch_wait_timeout_s=0.5)
async def handle_batch(self, request_list):
         for id, request in enumerate(request_list):
                 try:
                      content = await request.json()
         ....

async def __call__(self, request: Request) -> List[str]:
        return await self.handle_batch(request)

How can I add Header into that:
I tried:

@serve.batch(max_batch_size=8,batch_wait_timeout_s=0.5)
async def handle_batch(self, request_list, header_list):
         for id, request in enumerate(request_list):
                 try:
                      content = await request.json()
         ....

async def __call__(self, request: Request, header: Header) -> List[str]:
        return await self.handle_batch(request, header)

but it keep saying:

traceback (most recent call last):
(ServeReplica:default_Embedding_Server pid=11547)   File "/usr/local/lib/python3.10/dist-packages/ray/serve/_private/replica.py", line 633, in invoke_single
(ServeReplica:default_Embedding_Server pid=11547)     result = await method_to_call(*request_args, **request_kwargs)
(ServeReplica:default_Embedding_Server pid=11547) TypeError: Embedding_Server.__call__() missing 1 required positional argument: 'header'

Thanks in advance. It could help me a lot if I can solve this.

shrekris · October 13, 2023, 4:37pm

Hi @tattrongvu, welcome to the forums! Glad to hear that Ray Serve has been helpful.

Yes, it does! See this section in the docs for an example that uses batching and FastAPI together.

You need to access the header through the request object. The request objects are Starlette requests. These Starlette docs show how to access the headers:

For example: request.headers['content-type']

tattrongvu · October 16, 2023, 2:40pm

thank you very much @shrekris, it help me a lot.

Topic		Replies	Views
How to post data to dynamic batch directly？ Ray Serve	1	42	October 24, 2024
Hanging issue with serve.batch	2	353	December 22, 2023
Unable to get started with Ray Serve + FastAPI Ray Serve	1	1748	January 8, 2023
Ray Serve with Fast API and Serve batch- Client Request cancellation RLlib	0	60	January 3, 2025
Ray Serve - Client request Cancellation Ray Serve	2	116	March 27, 2025

Ray Serve get Header / Dynamic Batching with FastAPI

Related topics