Rayserve fault tolerance

Hello team

We have a setup of rayservice, with headnode connected to redis for fault tolerance of GCS, and is having “num-cpus” as 0, to not handle any traffic. All head and workers connected by istio mesh

I have observed this scenario that when rayservice head goes down for more than 10mins, lets say for 30mins. The ray workers continue to keep handling the requests for 10mins (configured server at EveryNode), after which rayworkers reject the requests, but are alive.

After 30 mins, when head service is back up, there is a huge spike in the requests of head node and to worker nodes. During this time error qps increases, real time requests are rejected, and takes a while for errors and latency to go down and handle the real time traffic.
There are no client side retries.

During this time i see the below api is called from head
http://:10002/ray.rpc.CoreWorkerService/PushTask

Questions are

  1. When head node is down and workers are down, are there any request queueing up happening, if so where ?
  2. why arent the worker nodes down, they are still alive after loosing connection with head node after 10mins, after 30mins they take the traffic ?
  3. How do i prevent that, when head node is up, i want it to serve the real traffic rather than serving stale requests while our request time out is 50ms.

Thanks