Tasks become slow when num of submitted task greater than num cpus

This function costs about 5 seconds in single thread:

def generate(*args):
    t1 = time.time()
    l = []
    for i in range(50000000):
        l.append(i)
        if len(l) > 100:
            l = []
    return time.time() - t1

When I submitted 100 tasks to ray, each task costs 10 seconds.

Here is my code:

import time
import ray

ray.init('ray://my-head-node', log_to_driver=False)

def generate():
    t1 = time.time()
    l = []
    for i in range(50000000):
        l.append(i)
        if len(l) > 100:
            l = []
    return time.time() - t1

for _ in range(5):
    print(generate())  #  about 5 seconds

gr = ray.remote(generate)

res = []
for _ in range(100):
    res.append(gr.remote())

r = ray.get(res)

print(r)   # 10+ seconds

Can anyone explain the reason? Thanks in advance!!

is it the same if you set num_cpus=2 for each task?