The relationship between the number of workers and processing time

When I increase the number of workers to n, the execution time is usually approximately 1/n of the execution time for a single Worker. However, in my experiment, when I increased the number of workers to 12 and then increased the number of workers, the execution time no longer dropped significantly. The machine I was running had 80 CPU cores, and the experimental code assigned one CPU core to each Worker. When I increased the amount of test data, this phenomenon still occurred when the number of workers was equal to 12. What causes this phenomenon? Is there a Worker process that cannot work in parallel?
The code is as follows:

    ray.init(num_cpus=40)
    start_time = time.time()
    transformed_ds = ray.data.from_items(rgb_list, parallelism=12)
    transformed_ds = transformed_ds.map(lambda index: get_tile_parallel_rgb_map(index, path_rgb, tile_offsets_rgb,
                                                                                tile_byte_counts_rgb), concurrency=12)
    result = transformed_ds.take_all()
    end_time = time.time()
    print("total time: " + str(end_time - start_time))

Can you paste your ray dashboard output to see the allocation of your resources on your Ray Cluster as well as the utilization of each Worker?