Ray Serve LLM does not natively enforce API key authentication via an environment variable or built-in config. The VLLM_API_KEY environment variable is ignored by Ray Serve LLM; authentication must be implemented at the application layer, such as by adding FastAPI middleware to your Ray Serve deployment. This is different from vanilla vLLM, which supports VLLM_API_KEY directly. For production, you should add authentication middleware to your FastAPI app before exposing it externally via ingress nginx, as Ray Serve does not provide this out of the box according to Ray GitHub Issue #55369.
Would you like more detail or a code example for adding API key authentication middleware to your Ray Serve FastAPI app?
can you provide me more API key authentication middleware to your Ray Serve , how i can add to the existing deployment, must re-build the rayserve docker image, what is the best solution
@RunLLM@AlaEddine Did you figure out how to do this? Can you even do this with Serve LLM or do I need to discard Serve LLM in favor of just Serve. There is no documentation on this issue anywhere.
Ray Serve LLM does not provide built-in API key authentication; you must implement authentication yourself, typically by adding middleware to your FastAPI app within the Serve deployment. The VLLM_API_KEY environment variable is not supported by Ray Serve LLM, so you cannot enable authentication via config or env var alone (Ray GitHub Issue #55369).
You do not need to discard Serve LLM—just add authentication logic (e.g., FastAPI middleware) to your deployment. Would you like a code example for this approach?
There is no official Ray Serve LLM documentation for API key authentication, but you can add FastAPI middleware to your Ray Serve deployment to enforce API key checks. This approach is demonstrated in Ray community discussions and is supported by Ray Serve’s FastAPI integration (Ray GitHub Issue #55369, Ray Serve Key Concepts).
Would you like a step-by-step code example for adding API key authentication middleware to your Ray Serve FastAPI app?
Certainly! Here’s a step-by-step example of adding API key authentication middleware to a FastAPI app served by Ray Serve. This ensures every request must include a valid API key in the header.
Step 1: Create FastAPI app with middleware
from fastapi import FastAPI, Request, HTTPException
from ray import serve
API_KEY = "your-secret-api-key"
app = FastAPI()
@app.middleware("http")
async def api_key_auth(request: Request, call_next):
if request.headers.get("x-api-key") != API_KEY:
raise HTTPException(status_code=401, detail="Invalid or missing API key")
return await call_next(request)
@app.get("/hello")
def hello():
return {"message": "Hello, authenticated user!"}
Step 2: Wrap the FastAPI app in a Ray Serve deployment
@serve.deployment
@serve.ingress(app)
class AuthenticatedApp:
pass
serve.run(AuthenticatedApp.bind(), route_prefix="/")
To add API key authentication to a Ray Serve LLM deployment, you need to wrap the OpenAI-compatible app (created with build_openai_app) in a FastAPI app with authentication middleware, then use @serve.ingress to integrate it. Ray Serve LLM apps are FastAPI apps under the hood, so you can add middleware as shown below (Ray GitHub Issue #55369).
Here’s a step-by-step example:
from fastapi import FastAPI, Request, HTTPException
from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app
API_KEY = "your-secret-api-key"
# 1. Build your LLM app
llm_config = LLMConfig(
model_loading_config=dict(
model_id="my-llama-3.1-8b",
model_source="meta-llama/Llama-3.1-8B-Instruct",
),
accelerator_type="L4",
)
llm_app = build_openai_app({"llm_configs": [llm_config]})
# 2. Create a FastAPI app and add authentication middleware
app = FastAPI()
@app.middleware("http")
async def api_key_auth(request: Request, call_next):
if request.headers.get("x-api-key") != API_KEY:
raise HTTPException(status_code=401, detail="Invalid or missing API key")
return await call_next(request)
# 3. Mount the LLM app under the FastAPI app
app.mount("/", llm_app)
# 4. Wrap with Ray Serve
@serve.deployment
@serve.ingress(app)
class AuthenticatedLLMApp:
pass
serve.run(AuthenticatedLLMApp.bind(), route_prefix="/")
Now, all requests to your LLM endpoints (e.g., /v1/chat/completions) require the correct API key.
@RunLLM Great! This is what I have been trying to do but ending in failure. Your code also ends in failure:
File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/routing.py", line 716, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/routing.py", line 736, in app
await route.handle(scope, receive, send)
File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/routing.py", line 462, in handle
await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'Application' object is not callable
This error occurs because build_openai_app returns a Ray Serve Application object, not a FastAPI app, so it cannot be mounted directly with app.mount("/", llm_app). The Ray Serve LLM API does not expose a raw FastAPI app for direct middleware composition (Ray GitHub Issue #55369).
To add authentication, you must implement it as middleware inside the LLM app itself or fork/customize the Ray Serve LLM code to inject your middleware. There is currently no supported way to wrap the Serve LLM app with external FastAPI middleware. Would you like guidance on customizing the Ray Serve LLM source to add authentication?
@RunLLM okay, great. Then my suspicion is correct that I have to drop Serve LLM for just Serve is I want an API that is suitable for production environments.
You do not have to drop Serve LLM entirely, but out-of-the-box, Serve LLM does not provide built-in API key authentication or a direct way to inject FastAPI middleware for production-grade security. If you require custom authentication at the HTTP layer, plain Ray Serve (with your own FastAPI app and middleware) offers more flexibility and is better suited for production environments where such controls are mandatory (Ray GitHub Issue #55369).
Would you like a code example for a Ray Serve deployment with full FastAPI control and authentication middleware?