Using ray for submitting async tasks from a FastAPI backend

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I’m trying to create an app in which users can submit jobs to train models. The training should happen asynchronously so that other tasks aren’t blocked so I’m using a ray actor for that. The user can also then query if the task has finished or not.
In order to do this I submit the job using id = model.train.remote() and store the id.
Now while querying in a different function, I use ready, not_ready = ray.wait([id], timeout=0). However, this always returns my job as not_ready even though I can see on the ray dashboard that the function has stopped execution.
However, using a blocking operation like this in the same function works but it blocks everything else.

id = model.train.remote()
ready, not_ready = ray.wait([id])

I’m not sure what the issue is here

Hi @manangoel99, thank you for surfacing this! @sangcho @Stephanie_Wang if a future id is completed, would you expect ray.wait([id], timeout=0) to return it as ready? I would assume that the future check would happen even with the 0 timeout.

Tried this myself and it appears that the future is returned ready, as expected:

In [1]: import ray; ray.init()

In [2]: @ray.remote
   ...: def foo():
   ...:     return 1

In [3]: a = foo.remote()

In [4]: ready, not_ready = ray.wait([a], timeout=0)

In [5]: ready
Out[7]: [ObjectRef(c8ef45ccd0112571ffffffffffffffffffffffff0100000001000000)]

In [8]: not_ready
Out[8]: []