Setup api key to call LLM via rayserve

Dear Ray community,

I deployed kubeRay on kubernetes , and im serve my local LLMs , i want to add API Key when requesting my LLM, via rayserve service

how i can setup api key ,is there an environment variable must be added to the RayService config , what is the option for that,

call my llm via http://ray-serve-llm-serve-svc_ip:8000 , i expose this service via ingress nginx for external access

Ray Serve LLM does not natively enforce API key authentication via an environment variable or built-in config. The VLLM_API_KEY environment variable is ignored by Ray Serve LLM; authentication must be implemented at the application layer, such as by adding FastAPI middleware to your Ray Serve deployment. This is different from vanilla vLLM, which supports VLLM_API_KEY directly. For production, you should add authentication middleware to your FastAPI app before exposing it externally via ingress nginx, as Ray Serve does not provide this out of the box according to Ray GitHub Issue #55369.

Would you like more detail or a code example for adding API key authentication middleware to your Ray Serve FastAPI app?

Sources:

Hint: Mention @RunLLM in the post for followups.

thanks for the feedback

can you provide me more API key authentication middleware to your Ray Serve , how i can add to the existing deployment, must re-build the rayserve docker image, what is the best solution

@RunLLM @AlaEddine Did you figure out how to do this? Can you even do this with Serve LLM or do I need to discard Serve LLM in favor of just Serve. There is no documentation on this issue anywhere.

Ray Serve LLM does not provide built-in API key authentication; you must implement authentication yourself, typically by adding middleware to your FastAPI app within the Serve deployment. The VLLM_API_KEY environment variable is not supported by Ray Serve LLM, so you cannot enable authentication via config or env var alone (Ray GitHub Issue #55369).

You do not need to discard Serve LLM—just add authentication logic (e.g., FastAPI middleware) to your deployment. Would you like a code example for this approach?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM Can you show me or point me to documentation that shows me how to accomplish this?

There is no official Ray Serve LLM documentation for API key authentication, but you can add FastAPI middleware to your Ray Serve deployment to enforce API key checks. This approach is demonstrated in Ray community discussions and is supported by Ray Serve’s FastAPI integration (Ray GitHub Issue #55369, Ray Serve Key Concepts).

Would you like a step-by-step code example for adding API key authentication middleware to your Ray Serve FastAPI app?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM yes, I would like a step-by-step code example for adding API key authentication middleware to your Ray Serve FastAPI app.

Certainly! Here’s a step-by-step example of adding API key authentication middleware to a FastAPI app served by Ray Serve. This ensures every request must include a valid API key in the header.

Step 1: Create FastAPI app with middleware

from fastapi import FastAPI, Request, HTTPException
from ray import serve

API_KEY = "your-secret-api-key"

app = FastAPI()

@app.middleware("http")
async def api_key_auth(request: Request, call_next):
    if request.headers.get("x-api-key") != API_KEY:
        raise HTTPException(status_code=401, detail="Invalid or missing API key")
    return await call_next(request)

@app.get("/hello")
def hello():
    return {"message": "Hello, authenticated user!"}

Step 2: Wrap the FastAPI app in a Ray Serve deployment

@serve.deployment
@serve.ingress(app)
class AuthenticatedApp:
    pass

serve.run(AuthenticatedApp.bind(), route_prefix="/")

Step 3: Query the endpoint with the API key

import requests

headers = {"x-api-key": "your-secret-api-key"}
resp = requests.get("http://localhost:8000/hello", headers=headers)
print(resp.json())  # Should print: {"message": "Hello, authenticated user!"}

This pattern is supported by Ray Serve’s FastAPI integration (Ray Serve Key Concepts). For more context, see the discussion in Ray GitHub Issue #55369.

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM it is not clear in your code example how to integrate serve llm.

To add API key authentication to a Ray Serve LLM deployment, you need to wrap the OpenAI-compatible app (created with build_openai_app) in a FastAPI app with authentication middleware, then use @serve.ingress to integrate it. Ray Serve LLM apps are FastAPI apps under the hood, so you can add middleware as shown below (Ray GitHub Issue #55369).

Here’s a step-by-step example:

from fastapi import FastAPI, Request, HTTPException
from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app

API_KEY = "your-secret-api-key"

# 1. Build your LLM app
llm_config = LLMConfig(
    model_loading_config=dict(
        model_id="my-llama-3.1-8b",
        model_source="meta-llama/Llama-3.1-8B-Instruct",
    ),
    accelerator_type="L4",
)
llm_app = build_openai_app({"llm_configs": [llm_config]})

# 2. Create a FastAPI app and add authentication middleware
app = FastAPI()

@app.middleware("http")
async def api_key_auth(request: Request, call_next):
    if request.headers.get("x-api-key") != API_KEY:
        raise HTTPException(status_code=401, detail="Invalid or missing API key")
    return await call_next(request)

# 3. Mount the LLM app under the FastAPI app
app.mount("/", llm_app)

# 4. Wrap with Ray Serve
@serve.deployment
@serve.ingress(app)
class AuthenticatedLLMApp:
    pass

serve.run(AuthenticatedLLMApp.bind(), route_prefix="/")

Now, all requests to your LLM endpoints (e.g., /v1/chat/completions) require the correct API key.

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM Great! This is what I have been trying to do but ending in failure. Your code also ends in failure:

 File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
 await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
 File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
 raise exc
 File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
 await app(scope, receive, sender)
 File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
 await self.app(scope, receive, send)
 File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/routing.py", line 716, in __call__
 await self.middleware_stack(scope, receive, send)
 File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/routing.py", line 736, in app
 await route.handle(scope, receive, send)
 File "/opt/conda/envs/ray-llm/lib/python3.12/site-packages/starlette/routing.py", line 462, in handle
 await self.app(scope, receive, send)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'Application' object is not callable

This error occurs because build_openai_app returns a Ray Serve Application object, not a FastAPI app, so it cannot be mounted directly with app.mount("/", llm_app). The Ray Serve LLM API does not expose a raw FastAPI app for direct middleware composition (Ray GitHub Issue #55369).

To add authentication, you must implement it as middleware inside the LLM app itself or fork/customize the Ray Serve LLM code to inject your middleware. There is currently no supported way to wrap the Serve LLM app with external FastAPI middleware. Would you like guidance on customizing the Ray Serve LLM source to add authentication?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM okay, great. Then my suspicion is correct that I have to drop Serve LLM for just Serve is I want an API that is suitable for production environments.

You do not have to drop Serve LLM entirely, but out-of-the-box, Serve LLM does not provide built-in API key authentication or a direct way to inject FastAPI middleware for production-grade security. If you require custom authentication at the HTTP layer, plain Ray Serve (with your own FastAPI app and middleware) offers more flexibility and is better suited for production environments where such controls are mandatory (Ray GitHub Issue #55369).

Would you like a code example for a Ray Serve deployment with full FastAPI control and authentication middleware?

Sources:

Hint: Mention @RunLLM in the post for followups.