Hello, I am new to Ray and have a few questions about task scheduling and retry:
- If worker 1 owns task A, is task A always going to be scheduled onto a worker other than worker 1? (Technically an owner is also a worker, so it should be able to run a task?)
- According to Ray’s whitepaper, when a task is scheduled, its owner will “send the task specification over gRPC to the leased worker”. What is the “task specification” specifically? Where is it stored, in-process of the owner?
- Suppose worker 1 scheduled task A to worker 2 on the same node, and that node failed and task A also failed as a consequence. According to the documentation, "when a worker is executing a task, if the worker dies unexpectedly, either because the process crashed or because the machine failed, Ray will rerun the task (after a delay of several seconds) until either the task succeeds or the maximum number of retries is exceeded. " How/Where would Ray retrieve the specification of task A in order to rerun the task?
Any clarification would be greatly appreciated. Thanks in advance!