Questions about fault tolerance in a Ray cluster

Hello, I have two questions about fault tolerance in a Ray cluster:

  • When a worker node failed, how do tasks owned by workers on that node get rescheduled onto other running nodes?
  • When the head node failed, would worker nodes continue scheduling and running tasks they own, or would they stop since they cannot talk to GCS? If the latter, is there a timeout value they would wait for GCS?

Thank you!