No request can complete until all requests are ready

Thanks for the reply @shrekris . This info helps to to really understand what is going on.

This behaviour is pretty surprising. It would probably be a good idea to improve the developer experience here. I noticed that my code was wrong here via testing but maybe you could do some of the following:

  1. Improve the Ray-Serve doc to explain what the process is to run long-running CPU/GPU bound tasks.
  2. Maybe it is possible to notify developers that long running tasks are not supported. Killing long running tasks might be one option. Logging long running tasks might be another. I haven’t used asyncio much so I am not sure about the feasibility of either of these solutions.

It might also make sense to build this functionality directly into ray. This could be done:

  1. Run all user-written code in a loop.run_in_executor call. This way user written code can never block Ray’s internal mechanics.
  2. Alternatively maybe you could add a flag to the deployment annotation, like this: @deployment(cpu_bound=True) . When that is true then you could execute the loop.run_in_executor on behalf of the user.

Thanks again for your insight. Have an awesome week.