When tuning trials that require little computation power, CPU utilization is low

My experiment is similar to the grid search example

from ray import tune
from ray.tune.tune_config import TuneConfig

# 1. Define an objective function.
def objective(config):
    score = config["a"] ** 2 + config["b"]
    return {"score": score}

# 2. Define a search space.
search_space = {
    "a": tune.grid_search(list(range(10000))),
    "b": tune.choice([1, 2, 3]),

# 3. Start a Tune run and print the best result.
tuner = tune.Tuner(
results = tuner.fit()
print(results.get_best_result(metric="score", mode="min").config)

The CPU utilization is only ~10% with reuse_actor=True.
As show in htop, only one single core reaches 100%, other cores are almost idle.
Although native Python multiprocessing provides better utilization and lower overhead, I still want to take advantage of features like resume in ray.tune.

Is there any setting that can reduce the overhead?