Hi,
I also asked this question on StackOverflow but will try it here as well:
I am completely new to Ray and am trying to understand runtimes. I want to execute YOLO object detector from OpenCV. I wrote the following code to benchmark performance:
@ray.remote
def f(x):
a, b, c = cv.detect_common_objects(x)
# reading in frame etc. omitted
# num_runs is the number of benchmarking runs
# num_parallel is the number of parallel OpenCV function calls in Ray
ray.init(num_cpus=min(16, num_parallel))
# run with Ray in parallel
for j in range(num_runs):
start = time.time()
result_ids = []
for i in range(num_parallel):
result_ids.append(f.remote(frame))
results = ray.get(result_ids)
end = time.time()
print(end-start)
# benchmark without Ray: Sequentially call OpenCV function
for j in range(num_runs):
start_noray = time.time()
for i in range(num_parallel):
a, b, c = cv.detect_common_objects(frame)
end_noray = time.time()
print(end_noray-start_noray)
I’m running on a 16 core CPU. After warm up, the runtimes look as follows:
num_parallel |
Ray | No ray, sequential |
---|---|---|
1 | 1900ms | 350ms |
8 | 2000ms | 2900s |
16 | 4000ms | 6000ms |
If I run top
in another shell, it tells me that Ray is using 100% of the CPU in all 3 cases (also when num_parallel = 1
). I am trying to understand these runtimes now:
The OpenCV function allows for parallel execution. The only explanation I can find for these runtimes is that each Ray worker is always placed on 2 CPU cores. But shouldn’t the CPU utilization in top
be lower then? Also, is there a way how a worker is placed on more CPU cores like it is the case when just calling the function without using Ray?
PS: I also ommited the num_cpus=min(16, num_parallel)
in ray.init()
and this didn’t cahnge the runtimes. I have it now to make sure idle ray processes don’t push up the CPU utilization.