Thanks for the reply @shrekris . This info helps to to really understand what is going on.
This behaviour is pretty surprising. It would probably be a good idea to improve the developer experience here. I noticed that my code was wrong here via testing but maybe you could do some of the following:
- Improve the Ray-Serve doc to explain what the process is to run long-running CPU/GPU bound tasks.
- Maybe it is possible to notify developers that long running tasks are not supported. Killing long running tasks might be one option. Logging long running tasks might be another. I haven’t used asyncio much so I am not sure about the feasibility of either of these solutions.
It might also make sense to build this functionality directly into ray. This could be done:
- Run all user-written code in a
loop.run_in_executor
call. This way user written code can never block Ray’s internal mechanics. - Alternatively maybe you could add a flag to the
deployment
annotation, like this:@deployment(cpu_bound=True)
. When that is true then you could execute theloop.run_in_executor
on behalf of the user.
Thanks again for your insight. Have an awesome week.