The relationship between the number of workers and processing time

SuperK · June 19, 2024, 1:38pm

When I increase the number of workers to n, the execution time is usually approximately 1/n of the execution time for a single Worker. However, in my experiment, when I increased the number of workers to 12 and then increased the number of workers, the execution time no longer dropped significantly. The machine I was running had 80 CPU cores, and the experimental code assigned one CPU core to each Worker. When I increased the amount of test data, this phenomenon still occurred when the number of workers was equal to 12. What causes this phenomenon? Is there a Worker process that cannot work in parallel？
The code is as follows:

    ray.init(num_cpus=40)
    start_time = time.time()
    transformed_ds = ray.data.from_items(rgb_list, parallelism=12)
    transformed_ds = transformed_ds.map(lambda index: get_tile_parallel_rgb_map(index, path_rgb, tile_offsets_rgb,
                                                                                tile_byte_counts_rgb), concurrency=12)
    result = transformed_ds.take_all()
    end_time = time.time()
    print("total time: " + str(end_time - start_time))

Sam_Chan · June 24, 2024, 4:41pm

Can you paste your ray dashboard output to see the allocation of your resources on your Ray Cluster as well as the utilization of each Worker?

Topic		Replies	Views
Processing performance of tasks Ray Core	14	809	March 8, 2021
Increasing the number of cpus does not improve efficiency Ray Core	1	65	July 23, 2024
CPU cores, CPU threads, and scaling of Ray tasks Ray Core	1	234	June 25, 2024
Confused with coreworker and worker Ray Core	3	694	August 7, 2022
Pytorch dataloader num_workers with ray tune RLlib	2	56	May 6, 2025

The relationship between the number of workers and processing time

Related topics