How severe does this issue affect your experience of using Ray?
None: Just asking a question out of curiosity
Low: It annoys or frustrates me for a moment.
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
High: It blocks me to complete my task.
Medium
We’re experiencing a situation in which calls to schedule tasks, f.remote(), take a very long time to return. These calls are supposed to be basically async and return quickly,
This happens specifically when we’re submitting many tasks (tens of thousands).
This is bad for compute utilization.
Qs:
How do calls to f.remote() work?
How can one diagnose why these calls are taking so long to return?
Are there any good patterns or best practices for submitting many tasks?
Note that object serialization is currently a synchronous operation, which f.remote() does implicitly for arguments. Not sure if that’s the cause, curious to hear what y’all find…
Right, so if any args sent to the f.remote() are large, they will have to be serialized, which could add to tasks in the hundreds, unless you only send object_refs (which too will have to be deserialized from object store).