HAProxy Config customization

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version: 2.55.0
  • Python version: 3.10
  • OS: Linux Rocky9
  • Cloud/Infrastructure:
  • Other libs/tools (if relevant):

3. What happened vs. what you expected:

  • Expected: (Hoping for) Ability to further customize HAProxy config.
  • Actual: Some gaps in HAProxy config that we would like to change.

When HAProxy was announced, we jumped on it - Excellent! Our initial attempt found that haproxy wasn’t starting up (we were running a nightly). So we enabled HAProxy in ray-serve to get the per-replica http servers and ran haproxy outside of ray-serve - connecting to ports 30000 - 30005.

With this configuration we saw the massive performance increase as advertised. We went further: tuning the haproxy config so it worked well with our clients hitting the (ray) serves.

On looking at the official release and seeing how well it works for us, with its’ generated haproxy config we’ve got some gaps that we can’t override with environment variables. What we’re hoping for is a way to customize the haproxy config, like supplying our own template, or ability to inject overrides somehow (Other enviroment vars?). Here’s a claude-code analysis of what we can do with env variables and what the gaps are:

❯ what RAY_SERVE* environment variables do I need to set to achieve the configuration in start-haproxy.sh ?                                                                                                       
                                                                                                            
⏺ Here's the mapping. I'll split into what you can achieve with env vars and what you can't.                                                                                                                      
                                                                                                                                                                                                                
  Env vars to set (differ from Ray defaults)                                                                                                                                                                      
                                                                                                                                                                                                                  
  RAY_SERVE_HAPROXY_MAXCONN=130000              # yours: 130000, Ray default: 20000                                                                                                                               
  RAY_SERVE_HAPROXY_TIMEOUT_CONNECT_S=5         # yours: 5s, Ray default: None (omitted)                                                                                                                          
  RAY_SERVE_HAPROXY_TIMEOUT_CLIENT_S=60         # yours: 60s, Ray default: 3600                                                                                                                                   
  RAY_SERVE_HAPROXY_TIMEOUT_SERVER_S=120        # yours: 120s, Ray default: None (omitted)                                                                                                                        
  RAY_SERVE_HAPROXY_HEALTH_CHECK_FASTINTER=1s   # yours: 1s, Ray default: 250ms                                                                                                                                   
                                                                                                                                                                                                                  
  Already matching Ray defaults (no change needed)      
                                                                                                                                                                                                                  
  # RAY_SERVE_HAPROXY_NBTHREAD=4                # matches
  # RAY_SERVE_HAPROXY_BALANCE_ALGORITHM=leastconn  # matches                                                                                                                                                      
  # RAY_SERVE_HAPROXY_HEALTH_CHECK_FALL=2       # matches
  # RAY_SERVE_HAPROXY_HEALTH_CHECK_RISE=2       # matches                                                                                                                                                         
  # RAY_SERVE_HAPROXY_HEALTH_CHECK_INTER=5s     # matches
  # RAY_SERVE_HAPROXY_METRICS_PORT=9101         # matches                                                                                                                                                         
  # stats port 8404                             # matches (hardcoded in HAProxyConfig)
                                                                                                                                                                                                                  
  Gaps — no env var available                           
                                                                                                                                                                                                                  
  ┌────────────────────────────────────────────┬─────────────────────────────────────────┬────────────────────────────────────────────────────────────────┐
  │                Your setting                │              Ray behavior               │                             Impact                             │
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤
  │ maxconnrate / maxsessrate (replicas * 512) │ Not supported                           │ No global rate limiting in Ray's config                        │
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤
  │ timeout http-request 30s                   │ None (omitted)                          │ No protection against slow request headers                     │                                                       
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤                                                       
  │ timeout queue 60s                          │ None (omitted)                          │ No limit on time a request waits in queue                      │                                                       
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤                                                       
  │ option redispatch                          │ Not included                            │ Failed request won't retry on another server                   │
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤                                                       
  │ retries 1                                  │ Not included                            │ No retry on connection failure                                 │
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤                                                       
  │ option http-keep-alive                     │ Controlled by template (likely present) │ Probably fine                                                  │
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤                                                       
  │ http-reuse aggressive                      │ always                                  │ always is actually more aggressive despite the name — no issue │
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤                                                       
  │ Health check path /health                  │ Hardcoded /-/healthz                    │ Checks Ray's built-in health, not your app's /health endpoint  │
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤                                                       
  │ Per-server maxconn 32                      │ Not supported                           │ No per-replica connection cap for backpressure                 │
  ├────────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤                                                       
  │ downinter absent                           │ 250ms (always emitted)                  │ Ray checks downed servers faster than yours would — harmless   │
  └────────────────────────────────────────────┴─────────────────────────────────────────┴────────────────────────────────────────────────────────────────┘                                                       
                                                        
  Bottom line                                                                                                                                                                                                     
                                                        
  The 5 env vars above get you most of the way. The meaningful gaps are:                                                                                                                                          
   
  1. maxconnrate/maxsessrate — your rate limiting won't be there                                                                                                                                                  
  2. timeout http-request and timeout queue — no slow-client or queue-depth protection
  3. Health check path — Ray checks /-/healthz (is the proxy alive?) not /health (is your model loaded?)                                                                                                          
  4. Per-server maxconn 32 — no backpressure per replica; a slow replica can absorb all connections                                                                                                               
                                                                                                                                                                                                                  
  Items 3 and 4 are the most operationally significant. Without per-server maxconn, you lose the pairing with max_ongoing_requests: 64 that keeps replicas from being overwhelmed. And /-/healthz won't tell      
  HAProxy that a replica hasn't loaded its model yet. 

(As claude says, we have max_ongoing_requests set to 64.)

We disagree with claude in its prioritization - Setting max conn rate and session rate and timeout http-request and queue are also quite important.

Looking for discussion / feedback before opening a jira request.
Thanks!

Ray Serve’s HAProxy integration exposes many—but not all—HAProxy config options via environment variables. You can set connection and timeout parameters like RAY_SERVE_HAPROXY_MAXCONN, RAY_SERVE_HAPROXY_TIMEOUT_CONNECT_S, RAY_SERVE_HAPROXY_TIMEOUT_CLIENT_S, RAY_SERVE_HAPROXY_TIMEOUT_SERVER_S, and RAY_SERVE_HAPROXY_HEALTH_CHECK_FASTINTER to customize the generated config. However, there is currently no supported way to inject arbitrary config lines, supply a custom HAProxy template, or set options like maxconnrate, maxsessrate, timeout http-request, timeout queue, option redispatch, retries, per-server maxconn, or to change the health check path to something other than /-/healthz (docs, constants.py, haproxy_templates.py).

These gaps mean you cannot currently enforce global rate limits, slow-client/queue-depth protection, per-replica connection caps, or application-specific health checks via Ray’s built-in HAProxy management. The only workaround is to run your own HAProxy instance outside Ray Serve, as you did, or to request this feature from the Ray team. Would you like a detailed breakdown of which config options are supported and how the template is generated?

Sources:

Hint: Mention @RunLLM in the post for followups.

Thanks @glingleNxn for posting! Glad HAProxy has helped performance.

It sounds like an environment variable like RAY_SERVE_HAPROXY_USER_CONFIG_PATH would be useful here. Since HAProxy supports composed .cfg files, the specified file could be composed into the default .cfg and used for larger sets of overrides.

It sounds like your team may understand this requirement quite clearly. Would you be interested in collaborating on a PR?

@glingleNxn if it sounds interesting, please reach out on Ray Slack: Join Slack | Ray

Yes, @seiji Will reach out on Slack.