Official Ray FastAPI tutorial - how to craft a request?

To learn ray I’m following along this tutorial:

https://docs.ray.io/en/master/serve/tutorials/web-server-integration.html#scaling-up-a-fastapi-application

After successful deployment of it into my K8s based ray cluster (using the official helm chart) I was wondering how I can craft a valid HTTP request towards it.

It expects a GET request, but sending it like

curl "http://127.0.0.1:8080/generate?query=Hello%20friend%2C%20how"

gives back:
Internal Server Error

Looking at the logs of the webserver / FastAPI logs shows:

[2021-06-28 08:44:06 +0000] [8] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/fastapi/applications.py", line 199, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/usr/local/lib/python3.7/site-packages/fastapi/routing.py", line 231, in app
    is_coroutine=is_coroutine,
  File "/usr/local/lib/python3.7/site-packages/fastapi/routing.py", line 138, in serialize_response
    return jsonable_encoder(response_content)
  File "/usr/local/lib/python3.7/site-packages/fastapi/encoders.py", line 149, in jsonable_encoder
    sqlalchemy_safe=sqlalchemy_safe,
  File "/usr/local/lib/python3.7/site-packages/fastapi/encoders.py", line 96, in jsonable_encoder
    sqlalchemy_safe=sqlalchemy_safe,
  File "/usr/local/lib/python3.7/site-packages/fastapi/encoders.py", line 127, in jsonable_encoder
    return ENCODERS_BY_TYPE[type(obj)](obj)
  File "pydantic/json.py", line 51, in pydantic.json.lambda
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2: invalid continuation byte

I’m pretty sure I am sending the request in a wrong format here - but I am unsure how to debug this further.

Hi, which version of Ray are you using? As of Ray 1.4, Ray Serve now natively integrates with FastAPI; you can check it out here: Calling Deployments via HTTP and Python — Ray v2.0.0.dev0. This is the recommended way to use Ray Serve with FastAPI. cc @simon-mo

Sorry you ran into an issue with the other tutorial, I’m not sure what that error is but I’ll investigate. We should probably mark that tutorial as deprecated and remove it soon–our fault for not doing it earlier. I wonder, do you get a similar error when just running the bare-bones “quickstart” example from Ray Serve: Scalable and Programmable Serving — Ray v2.0.0.dev0?

Hi, I am using the latest version of ray and I was trying to serve a model through FastAPI which is described here:

https://docs.ray.io/en/master/serve/tutorials/web-server-integration.html#serve-web-server-integration-tutorial

Is my link outdated? (Your link just talks about basic serving, so I would need to glue it together to serve a model.)

Sorry for the confusion! We’re about to do an overhaul of the docs which should smooth out some of these rough edges, and we really appreciate this feedback.

The tutorial you linked is no longer the best practice and we recommend using the new approach here Calling Deployments via HTTP and Python — Ray v2.0.0.dev0. Nevertheless, the tutorial you linked should still work–I did find a different bug when running it locally, so perhaps you can try the updated version here? ray/servehandle_fastapi.py at 073ea1d1febe7b29864546016da9c3416b972061 · ray-project/ray · GitHub

I’m trying to narrow down the cause of the UnicodeDecodeError you reported, so that I can file an issue. I’m trying to reproduce that error locally but I haven’t yet been able to. I wonder what’s the best way to get a minimal reproduction…

Thank you! I adjusted my code but the error stays the same, I’ve added some logging to dig deeper, but I will just leave it for now and try to adjust the new way of doing things, with serving a model through ray on K8s.

Python code:

from fastapi.logger import logger
import ray 
from ray import serve

from fastapi import FastAPI
from transformers import pipeline
import logging


app = FastAPI()


gunicorn_logger = logging.getLogger('gunicorn.error')
logger.handlers = gunicorn_logger.handlers
if __name__ != "main":
    logger.setLevel(gunicorn_logger.level)
else:
    logger.setLevel(logging.DEBUG)

# Define our deployment.
@serve.deployment(num_replicas=1)
class GPT2:
    def __init__(self):
        self.nlp_model = pipeline("text-generation", model="gpt2")

    async def predict(self, query: str):
        return self.nlp_model(query, max_length=50)

    async def __call__(self, request):
        return self.predict(await request.body())


@app.on_event("startup")  # Code to be run when the server starts.
async def startup_event():
    logger.info("startup")
    # Connect to the running Ray cluster in K8s:
    ray.client("example-cluster-ray-head.ray.svc.cluster.local:10001").connect()
    serve.start(http_host=None)  # Start the Ray Serve instance.

    # Deploy our GPT2 Deployment.
    GPT2.deploy()

@app.get("/generate")
async def generate(query: str):
    logger.info(query)
    # Get a handle to our deployment so we can query it in Python.
    handle = GPT2.get_handle(sync=False)
    logger.info(str(handle))
    return await handle.predict.remote(query)

@app.on_event("shutdown")  # Code to be run when the server shuts down.
async def shutdown_event():
        serve.shutdown()  # Shut down Ray Serve.

Port-forward to FastAPI pod/svc: kubectl port-forward svc/fastapi-ray-svc 8080:8080
Call: curl "http://127.0.0.1:8080/generate?query=Hello"

Log:

[2021-06-30 07:55:34 +0000] [10] [INFO] Application startup complete.                                                                                                                                                                                                                
[2021-06-30 08:11:23 +0000] [10] [INFO] Hello
[2021-06-30 08:11:23 +0000] [10] [INFO] RayServeHandle(endpoint='GPT2')
[2021-06-30 08:11:23 +0000] [10] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/fastapi/applications.py", line 199, in __call__                                                                                                                                                                                       
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/applications.py", line 112, in __call__                                                                                                                                                                                     
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in __call__                                                                                                                                                                                
    raise exc from None
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in __call__                                                                                                                                                                                
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in __call__                                                                                                                                                                                        
    raise exc from None
  File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in __call__                                                                                                                                                                                        
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 580, in __call__                                                                                                                                                                                          
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 241, in handle                                                                                                                                                                                            
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/usr/local/lib/python3.7/site-packages/fastapi/routing.py", line 231, in app
    is_coroutine=is_coroutine,
  File "/usr/local/lib/python3.7/site-packages/fastapi/routing.py", line 138, in serialize_response
    return jsonable_encoder(response_content)
  File "/usr/local/lib/python3.7/site-packages/fastapi/encoders.py", line 149, in jsonable_encoder
    sqlalchemy_safe=sqlalchemy_safe,
  File "/usr/local/lib/python3.7/site-packages/fastapi/encoders.py", line 96, in jsonable_encoder
    sqlalchemy_safe=sqlalchemy_safe,
  File "/usr/local/lib/python3.7/site-packages/fastapi/encoders.py", line 127, in jsonable_encoder
    return ENCODERS_BY_TYPE[type(obj)](obj)
  File "pydantic/json.py", line 51, in pydantic.json.lambda
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 3: invalid start byte

Following along the tutorial you posted, I think it’s missing dependencies.
NameError: name 'serve' is not defined

I think there should be
from ray import serve

at the top. After adding it I saw that this snippet is not meant to be a complete example:

ray.serve.exceptions.RayServeException: Called `serve.connect()` but there is no instance running on this Ray cluster. Please call `serve.start(detached=True) to start one.

I currently have tried to put together a basic FastAPI + ray on K8s program based on the documentation, but it does not work, when I call it, it blocks and then just returns nothing.

from fastapi import FastAPI
from ray import serve
import ray 

app = FastAPI()
ray.client("example-cluster-ray-head.ray.svc.cluster.local:10001").connect()

@serve.deployment(route_prefix="/hello")
@serve.ingress(app)
class MyFastAPIDeployment:
    @app.get("/")
    def root(self):
        return "Hello, world!"

I tried adding

serve.start(detached=True, http_options={"host": "0.0.0.0"})

which resulted in the error:

[2021-06-30 13:28:52 +0000] [9] [ERROR] Exception in worker process                                      
Traceback (most recent call last):                                                
  File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()                                                   
  File "/usr/local/lib/python3.7/site-packages/uvicorn/workers.py", line 63, in init_process    
    super(UvicornWorker, self).init_process()                                                 
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process  
    self.load_wsgi()                                                                                                                                                                                                   
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi     
    self.wsgi = self.app.wsgi()                                                                                        
  File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi                  
    self.callable = self.load()                                                                                                                                                                                                                                                       
  File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load    
    return self.load_wsgiapp()                                                        
  File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp                                                                                                                                                                                     
    return util.import_app(self.app_uri)                                                 
  File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
    mod = importlib.import_module(module)                                                                      
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module  
    return _bootstrap._gcd_import(name[level:], package, level)                                          
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import                   
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load                           
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked                                  
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked                             
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module                              
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed                       
  File "/app/app/main.py", line 7, in <module>                                                                                                                                                                         
    serve.start(detached=True, http_options={"host": "0.0.0.0"})                                     
  File "/usr/local/lib/python3.7/site-packages/ray/serve/api.py", line 655, in start                                   
    "serve.start(detached=True) should not be called in anonymous "                                     
RuntimeError: serve.start(detached=True) should not be called in anonymous Ray namespaces because you won't be able to reconnect to the Serve instance after the script exits. If you want to start a long-lived Serve instance, provide a namespace when connecting to Ray. See the d
ocumentation for more details: https://docs.ray.io/en/master/namespaces.html?highlight=namespace#using-namespaces.                 

I tried detached=False which resulted in the error:

[2021-06-30 13:31:00 +0000] [8] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/fastapi/applications.py", line 199, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/usr/local/lib/python3.7/site-packages/fastapi/routing.py", line 208, in app
    dependency_overrides_provider=dependency_overrides_provider,
  File "/usr/local/lib/python3.7/site-packages/fastapi/dependencies/utils.py", line 550, in solve_dependencies
    solved = await run_in_threadpool(call, **sub_values)
  File "/usr/local/lib/python3.7/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/ray/serve/http_util.py", line 165, in get_current_servable_instance
    return serve.get_replica_context().servable_object
  File "/usr/local/lib/python3.7/site-packages/ray/serve/api.py", line 1012, in get_replica_context
    raise RayServeException("`serve.get_replica_context()` "
ray.serve.exceptions.RayServeException: `serve.get_replica_context()` may only be called from within a Ray Serve backend.                       

Could you point me towards a working example?

The last thing I tried today was just serving from the K8s example cluster which is described in this howto, running this:

import ray
from ray import serve

# Connect to the running Ray cluster.
ray.init(address="auto")
# Bind on 0.0.0.0 to expose the HTTP server on external IPs.
serve.start(detached=True, http_options={"host": "0.0.0.0"})


@serve.deployment(route_prefix="/hello")
def hello(request):
    return "hello world"

hello.deploy()

on the head node yields back:

ray.serve.exceptions.RayServeException: Called `serve.connect()` but there is no instance running on this Ray cluster. Please call `serve.start(detached=True) to start one.

So I tried that:

serve.start(detached=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/api.py", line 655, in start
    "serve.start(detached=True) should not be called in anonymous "
RuntimeError: serve.start(detached=True) should not be called in anonymous Ray namespaces because you won't be able to reconnect to the Serve instance after the script exits. If you want to start a long-lived Serve instance, provide a namespace when connecting to Ray. See the documentation for more details: https://docs.ray.io/en/master/namespaces.html?highlight=namespace#using-namespaces.

In the documentation there is no hint about creating a namespaces or having to start a serve instance (I thought this is done for me by the K8s operator) - am I doing it wrong?

Thanks a lot for the further details. The “namespace” issue occurs when using Ray Client; to fix it you can replace your call ray.client("example-cluster-ray-head.ray.svc.cluster.local:10001").connect() with ray.client("example-cluster-ray-head.ray.svc.cluster.local:10001").namespace("my_namespace").connect(), and then it should work with detached=True.

I’ve verified that the following script works locally after calling ray start --head but have not tried it on K8s:

from fastapi import FastAPI
from ray import serve
import ray 

app = FastAPI()
ray.client("localhost:10001").namespace("my_namespace").connect()

serve.start(detached=True)
@serve.deployment(route_prefix="/hello")
@serve.ingress(app)
class MyFastAPIDeployment:
    @app.get("/")
    def root(self):
        return "Hello, world!"

MyFastAPIDeployment.deploy()

@eoakes @simon-mo any ideas about the UnicodeDecodeError, or about whether the Serve+K8s tutorial is working as intended or needs to be updated?