Thanks for the details on what is going on under the hood in a get call. I looked through the code some, but it is helpful to have the high level overview to guide that reading.
The high level workload is a neural network inference service. I have a single on prem machine with 12 cores (24 virtual) as well as a recent NVIDIA GPU. the workload result I am getting is primarily pytorch using the GPU. A few of the CPU cores are also being used by other processes (including redis), but they are constrained to use at most 12 of the 24 CPUs and have statically assigned CPU affinities. There is no other workload on the GPU. I have the python garbage collector disabled during this call to ray.get
As a basic test, I ran the following code over night. It isn’t the same workload since I wanted it to be reproducible for others.
import random
import time
import numpy as np
import ray
from ray.exceptions import GetTimeoutError
mean_time = 0.04
@ray.remote
def foo():
time.sleep(mean_time + random.uniform(-0.02, 0.02))
ray.init()
times = []
while True:
remote = foo.remote()
try:
start = time.perf_counter()
ray.get([remote], timeout=mean_time)
except GetTimeoutError:
pass
time_taken = time.perf_counter() - start
if time_taken > mean_time * 1.1:
print(time_taken, mean_time)
times.append(time_taken)
if len(times) % 1000 == 0:
print(
len(times),
np.percentile(
times,
(
50,
75,
90,
95,
99,
99.9,
99.99,
99.999,
99.9999,
99.99999,
99.999999,
99.9999999,
99.99999999,
99.999999999,
),
),
)
1215000 [0.04018716 0.04024355 0.04028641 0.04031676 0.04038924 0.04047388 0.04091035 0.0595979 0.06972685 0.07359515 0.07406085 0.07410742 0.07411208 0.07411254]
After 1.2m calls, it is clear that 99% of the time, the call returns within 1% of the requested timeout, 99.99% of the time the get call returns within 2.5% of the requested timeout, but it gets as bad as taking almost double the requested timeout in the worst case. In practice on my workload, I’ve seen it occasionally take nearly 3x the requested timeout.
Perhaps I would be better off waiting until near the end of the timeout and making a get request with a timeout of 0? I’m afraid though that the source of the delay is not in the loop watched by the timeout and so this wouldn’t have any practical effect.
There are a few other services also using redis so perhaps this is a possible cause of delay?