Understanding runtimes and task placement on hardware

ferdi_kossmann · February 25, 2022, 10:36pm

Hi,

I also asked this question on StackOverflow but will try it here as well:

I am completely new to Ray and am trying to understand runtimes. I want to execute YOLO object detector from OpenCV. I wrote the following code to benchmark performance:

@ray.remote
def f(x):
    a, b, c = cv.detect_common_objects(x)

# reading in frame etc. omitted
# num_runs is the number of benchmarking runs 
# num_parallel is the number of parallel OpenCV function calls in Ray


ray.init(num_cpus=min(16, num_parallel))

# run with Ray in parallel
for j in range(num_runs):

    start = time.time()

    result_ids = []
    for i in range(num_parallel):
        result_ids.append(f.remote(frame))
    results = ray.get(result_ids)

    end = time.time()
    print(end-start)

# benchmark without Ray: Sequentially call OpenCV function
for j in range(num_runs):

    start_noray = time.time()
    
    for i in range(num_parallel):
        a, b, c = cv.detect_common_objects(frame)

    end_noray = time.time()
    print(end_noray-start_noray)

I’m running on a 16 core CPU. After warm up, the runtimes look as follows:

`num_parallel`	Ray	No ray, sequential
1	1900ms	350ms
8	2000ms	2900s
16	4000ms	6000ms

If I run top in another shell, it tells me that Ray is using 100% of the CPU in all 3 cases (also when num_parallel = 1). I am trying to understand these runtimes now:

The OpenCV function allows for parallel execution. The only explanation I can find for these runtimes is that each Ray worker is always placed on 2 CPU cores. But shouldn’t the CPU utilization in top be lower then? Also, is there a way how a worker is placed on more CPU cores like it is the case when just calling the function without using Ray?

PS: I also ommited the num_cpus=min(16, num_parallel) in ray.init() and this didn’t cahnge the runtimes. I have it now to make sure idle ray processes don’t push up the CPU utilization.

Huaiwei_Sun · October 18, 2022, 5:29pm

If I understand correctly, you are asking why Ray is always using 100% of CPU in all 3 cases? @ferdi_kossmann
cc: @jjyao

ClarenceNg · November 14, 2022, 8:01pm

@ferdi_kossmann thanks for the question

A few things to consider:

each remote call creates a new task that is run in its own process - in the example it allows num_parallel tasks to run in parallel and each of them therefore runs in a separate process.
num_cpus is used by the scheduler for task execution, there is no explicit assignment between the processor and the CPU

To throttle the number of parallel tasks the recommendation is to set num_cpus to the cluster (ray.init) or to the individual tasks (ray.remote)

depending on what the input is (frame) if it is a large argument it may also impact performance - one way to avoid that is to first write the argument to the Ray object store (per ‘Tips’ in Ray Core API — Ray 2.1.0)

Let me know if this helps

Topic		Replies	Views
Performance overhead numbers Ray Core	9	629	December 8, 2021
Violation of number of reserved CPUs for remote function Ray Core	6	549	March 15, 2022
Processing performance of tasks Ray Core	14	808	March 8, 2021
CPU cores, CPU threads, and scaling of Ray tasks Ray Core	1	232	June 25, 2024
Ray on single machine. No threading? Ray Core	10	2164	April 2, 2021

Understanding runtimes and task placement on hardware

Related topics