Ray tune and non-CPU-bound tasks

I like the interface and functionality of Ray Tune. However, I’ve observed a large per-task overhead when I use non-CPU-bound tasks (the actual compute is done on a 3rd party service). I understand that this is a non-standard case for ray tune, but there are likely some settings to reduce per-task overhead.

My code:

import time
import ray
from ray import tune

ray.init(dashboard_host="0.0.0.0", include_dashboard=True, num_cpus=1)

def train_model(config):
    print("started")
    time.sleep(5)
    print(f"done {config}")
    score = sum(map(int, config.values()))
    return {"score": score}

config = {"n": tune.uniform(-50, 50)}

analysis = tune.run(
    train_model,
    verbose=False,
    config=config,
    num_samples=400,
    max_concurrent_trials=100,
    resources_per_trial={"CPU": 0.01},
)

Expected theoretical execution time 20 seconds (400 tasks, 5 seconds each, executed by 100 workers: 400*5/100 = 20 seconds)

Actual time: 3m22.510s

Unfortunately I think the bottleneck is going to be the scheduling here. You can try using Ray Core directly for this, which may reduce some of the overhead.