I like the interface and functionality of Ray Tune. However, I’ve observed a large per-task overhead when I use non-CPU-bound tasks (the actual compute is done on a 3rd party service). I understand that this is a non-standard case for ray tune, but there are likely some settings to reduce per-task overhead.
My code:
import time
import ray
from ray import tune
ray.init(dashboard_host="0.0.0.0", include_dashboard=True, num_cpus=1)
def train_model(config):
print("started")
time.sleep(5)
print(f"done {config}")
score = sum(map(int, config.values()))
return {"score": score}
config = {"n": tune.uniform(-50, 50)}
analysis = tune.run(
train_model,
verbose=False,
config=config,
num_samples=400,
max_concurrent_trials=100,
resources_per_trial={"CPU": 0.01},
)
Expected theoretical execution time 20 seconds (400 tasks, 5 seconds each, executed by 100 workers: 400*5/100 = 20 seconds)
Actual time: 3m22.510s