Is there a way to use the tune.with_parameter and tune.with_resources together? The argument of tune.with_parameter is trainable. I tried the following but apparent it’s not the right way of using it:
trainable_with_resources = tune.with_resources(trainable, {"cpu": input_param.num_cpu_per_job})
tune.with_parameters(trainable_with_resources, data = data)
I put more details in the following issue report
opened 10:08PM - 01 Dec 22 UTC
bug
triage
### What happened + What you expected to happen
I'm trying to use all CPUs of… a machine for each trial via
trainable_with_resources = tune.with_resources(trainable, {"cpu": multiprocessing.cpu_count()})
I would expect there will be one trial running at one time. However, all trials run in parallel when using trainable_with_resources as an argument of tune.with_parameters(), i.e.
tune.with_parameters(trainable_with_resources, data = data)
It sounds like the constraints applied by tune.with_resources() is overwritten with default value, i.e. 1 cpu per trial, when using together with tune.with_parameters()
### Versions / Dependencies
Ray.2.1.0
### Reproduction script
import os, time, ray
import pandas as pd
import multiprocessing
from ray import tune, air
from hyperopt import hp
from ray.tune.search.hyperopt import HyperOptSearch
from ray.tune.search import ConcurrencyLimiter
# disable dash board. See: https://discuss.ray.io/t/disable-dashboard/8471
ray.init(include_dashboard=False)
# 1. Define an objective function.
def objective(config, data = None):
time.sleep(10)
f = open(os.path.join(os.getcwd(), "oo.root"), "w")
for i in range(10): # 10 epochs
score = config["a"] ** 2 + config["b"]
#SCORE 1st apparence which defines the key of the dictionary, i.e. metric="SCORE",
# or return {"SCORE": score}
tune.report(SCORE=score) # this or the following return should work
#return {"SCORE": score}
# 2. Define a search space.
search_space = {
"a": hp.uniform("a", 0, 1),
"b": hp.uniform("b", 0, 1)
}
raw_log_dir = "./ray_log"
raw_log_name = "example"
algorithm = HyperOptSearch(search_space, metric="SCORE", mode="max", n_initial_points=1)
algorithm = ConcurrencyLimiter(algorithm, max_concurrent=8)
if os.path.exists(os.path.join(raw_log_dir, raw_log_name)) == False:
print('--- this is the 1st run ----')
else: #note: restoring described here doesn't work: https://docs.ray.io/en/latest/tune/tutorials/tune-stopping.html
print('--- previous run exist, continue the tuning ----')
algorithm.restore_from_dir(os.path.join(raw_log_dir, raw_log_name))
# 3. Start a Tune run and print the best result.
trainable_with_resources = tune.with_resources(objective, {"cpu": multiprocessing.cpu_count()})
tuner = tune.Tuner(
tune.with_parameters(trainable_with_resources, data = "aaaa"),
tune_config = tune.TuneConfig(
num_samples = 8, # number of tries. too expensive for Brian2
time_budget_s = 10000, # tot running time in seconds
search_alg=algorithm,
),
run_config = air.RunConfig(local_dir = raw_log_dir, name = raw_log_name) # where to save the log which will be loaded later
)
results = tuner.fit()
best_result = results.get_best_result(metric="SCORE", mode="max")
### Issue Severity
High: It blocks me from completing my task.
It turns out the issue is already spotted and will be fixed in the Ray2.2. I’m closing this thread
1 Like