I also asked this question on StackOverflow but will try it here as well:
I am completely new to Ray and am trying to understand runtimes. I want to execute YOLO object detector from OpenCV. I wrote the following code to benchmark performance:
@ray.remote def f(x): a, b, c = cv.detect_common_objects(x) # reading in frame etc. omitted # num_runs is the number of benchmarking runs # num_parallel is the number of parallel OpenCV function calls in Ray ray.init(num_cpus=min(16, num_parallel)) # run with Ray in parallel for j in range(num_runs): start = time.time() result_ids =  for i in range(num_parallel): result_ids.append(f.remote(frame)) results = ray.get(result_ids) end = time.time() print(end-start) # benchmark without Ray: Sequentially call OpenCV function for j in range(num_runs): start_noray = time.time() for i in range(num_parallel): a, b, c = cv.detect_common_objects(frame) end_noray = time.time() print(end_noray-start_noray)
I’m running on a 16 core CPU. After warm up, the runtimes look as follows:
||Ray||No ray, sequential|
If I run
top in another shell, it tells me that Ray is using 100% of the CPU in all 3 cases (also when
num_parallel = 1). I am trying to understand these runtimes now:
The OpenCV function allows for parallel execution. The only explanation I can find for these runtimes is that each Ray worker is always placed on 2 CPU cores. But shouldn’t the CPU utilization in
top be lower then? Also, is there a way how a worker is placed on more CPU cores like it is the case when just calling the function without using Ray?
PS: I also ommited the
num_cpus=min(16, num_parallel) in
ray.init() and this didn’t cahnge the runtimes. I have it now to make sure idle ray processes don’t push up the CPU utilization.