Exception ignored in: 'ray._raylet.task_execution_handler'

ray 1.9 on windows 10

My program is hanging with Exception ignored in: ‘ray._raylet.task_execution_handler’ followed with a StackTrace Information of a list of PyInit__raylet’s. What should I be looking at to resolve the problem?

import ray
from ray import serve

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

http_options = {'host': "127.0.0.1", 'port': 8787, 'location': "HeadOnly", 'num_cpus': 2}

ray.init(address="127.0.0.1:8787", namespace="serve")
serve.start(http_options=http_options)

@serve.deployment(route_prefix="/api")
@serve.ingress(app)
class Deployment:
    def __init__(self):
        load_data()

    @app.post("/first")
    async def do_first(self):
        etc.
        etc.
C:\...\first.py
2022-02-02 18:46:26,018 INFO worker.py:842 -- Connecting to existing Ray cluster at address: 127.0.0.1:8787
2022-02-02 18:46:28,230 INFO api.py:414 -- Connecting to existing Serve instance in namespace 'serve'.
2022-02-02 18:46:28,280 INFO api.py:242 -- Updating deployment 'Deployment'. component=serve deployment=Deployment
 pid=8712) 2022-02-02 18:46:28,380      INFO deployment_state.py:874 -- Stopping 1 replicas of deployment 'Deployment' with outdated versions. component=serve deployment=Deployment
 pid=8712) 2022-02-02 18:46:48,699      INFO deployment_state.py:912 -- Adding 1 replicas to deployment 'Deployment'. component=serve deployment=Deployment
 pid=13412) Exception ignored in: 'ray._raylet.task_execution_handler'
 pid=13412) [2022-02-02 18:46:54,646 C 13412 12068] direct_actor_transport.cc:151:  Check failed: objects_valid 0  1
 pid=13412) *** StackTrace Information ***
 pid=13412)     PyInit__raylet
 pid=13412)     PyInit__raylet
 pid=13412)     PyInit__raylet
 pid=13412)     PyInit__raylet
 pid=13412)     PyInit__raylet
 pid=13412)     PyInit__raylet
 pid=13412)     PyInit__raylet
 pid=13412)     PyInit__raylet
 pid=13412)     PyInit__raylet
 pid=13412)

followed by multiple messages like this:

 pid=9784) 2022-02-02 19:23:57,745      WARNING deployment_state.py:1123 -- Deployment 'Deployment' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.component=serve deployment=Deployment
 pid=9784) 2022-02-02 19:24:27,747      WARNING deployment_state.py:1123 -- Deployment 'Deployment' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.component=serve deployment=Deployment

Hi @Henry_Thornton do you mind expanding your code for load_data() ? To me it looks like the replica failed to finish executing constructor. It will also be helpful if you try out:

  1. Have very simple init() and do_first like printing “hi” to ensure ray serve runs on minimal setup
  2. Provide full logs on the very first deployment, logs here is suggesting there’s already a running replica in ray cluster. You can do it by ray stop --force && ray start --head .

First, the application code works without Ray. Using Ray, causes the error.

The full error trace is:

2022-02-09 21:42:20,321 INFO worker.py:852 -- Connecting to existing Ray cluster at address: 127.0.0.1:8787
2022-02-09 21:42:20,834 INFO api.py:426 -- Connecting to existing Serve instance in namespace 'serve'.
ray is initialized
2022-02-09 21:42:20,880 INFO api.py:249 -- Updating deployment 'Deployment'. component=serve deployment=Deployment
 pid=5592) 2022-02-09 21:42:20,924      INFO deployment_state.py:920 -- Adding 1 replicas to deployment 'Deployment'. component=serve deployment=Deployment
 pid=7020) Exception ignored in: 'ray._raylet.task_execution_handler'
 pid=7020) [2022-02-09 21:42:23,051 C 7020 4700] direct_actor_transport.cc:158:  Check failed: objects_valid 0  1
 pid=7020) *** StackTrace Information ***
 pid=7020)     PyInit__raylet
 pid=7020)     PyInit__raylet
 pid=7020)     PyInit__raylet
 pid=7020)     PyInit__raylet
 pid=7020)     PyInit__raylet
 pid=7020)     PyInit__raylet
 pid=7020)     PyInit__raylet
 pid=7020)     PyInit__raylet
 pid=7020)     PyInit__raylet
 pid=7020)
 pid=7020) <class 'UnicodeDecodeError'> 'utf-8' codec can't decode byte 0x97 in position 2049: invalid start byte <traceback object at 0x0000027E7AF9C940> xed307
 pid=5592) 2022-02-09 21:42:51,042      WARNING deployment_state.py:1131 -- Deployment 'Deployment' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.component=serve deployment=Deployment

The load_data() function is of the form:

with SqliteDict(dbname, flag="c", autocommit=True, encode=msgpack_serialize, decode=msgpack_deserialize) as db:

Sqlitedict (https://github.com/RaRe-Technologies/sqlitedict) is a persistent dictionary built on top of sqlite3. Note, sqlitedict has “multithreaded support” as a work-around for sqlite3 limitations in Python.

To narrow the problem, I replaced sqlitedict/sqlite3 with basic file storage and the application worked with Ray/Serve.

Does Ray have a problem with sqlite3 and databases generally?

Traceback received by the client:

Traceback (most recent call last):
  File "C:\Users\User\anaconda3\lib\site-packages\requests\models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Users\User\anaconda3\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\User\anaconda3\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\User\anaconda3\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\... .py", line 599, in <module>
    result = response.json()
  File "C:\Users\User\anaconda3\lib\site-packages\requests\models.py", line 917, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: [Errno Expecting value] Task Error. Traceback: ←[36mray::RayServeWrappedReplica.handle_request()←[39m (pid=12860, ip=127.0.0.1)
  File "C:\Users\User\anaconda3\lib\site-packages\ray\serve\replica.py", line 170, in wrap_to_ray_error
    raise exception
  File "C:\Users\User\anaconda3\lib\site-packages\ray\serve\replica.py", line 326, in invoke_single
    result = await method_to_call(*args, **kwargs)
  File "C:\Users\User\anaconda3\lib\site-packages\ray\serve\api.py", line 600, in __call__
    await self._serve_app(
  File "C:\Users\User\anaconda3\lib\site-packages\fastapi\applications.py", line 212, in __call__
    await super().__call__(scope, receive, send)
  File "C:\Users\User\anaconda3\lib\site-packages\starlette\applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "C:\Users\User\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 181, in __call__
    raise exc
  File "C:\Users\User\anaconda3\lib\site-packages\starlette\middleware\errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "C:\Users\User\anaconda3\lib\site-packages\starlette\exceptions.py", line 82, in __call__
    raise exc
  File "C:\Users\User\anaconda3\lib\site-packages\starlette\exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "C:\Users\User\anaconda3\lib\site-packages\starlette\routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "C:\Users\User\anaconda3\lib\site-packages\starlette\routing.py", line 259, in handle
    await self.app(scope, receive, send)
  File "C:\Users\User\anaconda3\lib\site-packages\starlette\routing.py", line 61, in app
    response = await func(request)
  File "C:\Users\User\anaconda3\lib\site-packages\fastapi\routing.py", line 216, in app
    solved_result = await solve_dependencies(
  File "C:\Users\User\anaconda3\lib\site-packages\fastapi\dependencies\utils.py", line 529, in solve_dependencies
    solved = await run_in_threadpool(call, **sub_values)
  File "C:\Users\User\anaconda3\lib\site-packages\starlette\concurrency.py", line 39, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
AttributeError: module 'anyio' has no attribute 'to_thread'.: 0

Upgraded to Ray 1.10 on Windows 10.
Problem remains.

Hi @Henry_Thornton , sorry you’re running into this – it sounds like this could be a bug in Ray. Could you please provide a minimal reproduction script that we can run to investigate further?

Also, does the same problem happen if you just use Ray without Serve (e.g. just using a Ray Task?)

The minimal example for both Ray and Ray+Serve work. But, it doesn’t work in the application. Bugger.

Lets leave for now. Maybe the answer will appear later … Thanks