How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Description:
I am performing a multi concurrent tune task with ray tune(Ray 2.6.1). The server resources are sufficient,but when the calling and the executing are not in the same file, max_concurrent_trials
is not working, maybe what i said was not very clear.
in simple terms, python a.py
is working, but python b.py
is not working.
a.py:
import os
import tempfile
import time
import unittest
import ray
from catboost import CatBoostClassifier
from ray import tune, train, air
from ray.air import RunConfig, session, Checkpoint, CheckpointConfig
from hyperopt import hp
from ray.tune import ExperimentAnalysis
from ray.tune.search import ConcurrencyLimiter
from ray.tune.search.hyperopt import HyperOptSearch
from sklearn.metrics import f1_score, recall_score, precision_score
from automl.automl.modeling.hpo.ray.callback import LogInfoCallback
from automl.automl.modeling.hpo.ray.reporter import LogInfoReporter
class RayHPO:
def train(self) -> None:
os.environ["RAY_AIR_NEW_OUTPUT"] = "0"
space = {
"verbose": hp.choice("verbose", [False]),
"learning_rate": hp.uniform("learning_rate", 5e-3, 0.2),
"depth": hp.randint("depth", 5, 8),
}
ray.init(num_cpus=3, include_dashboard=True, logging_level='error')
hyperopt_search = HyperOptSearch(space, metric="f1", mode="max")
reporter = LogInfoReporter(infer_limit=5, max_report_frequency=15)
callbacks = [LogInfoCallback(metric="f1")]
tuner = tune.Tuner(
trainable_demo,
tune_config=tune.TuneConfig(
num_samples=20,
search_alg=hyperopt_search,
metric="f1",
mode="max",
max_concurrent_trials=4,
),
run_config=air.RunConfig(storage_path="/mnt/disk1/tmp/ray_results", name="con",
callbacks=callbacks,
progress_reporter=reporter, verbose=2)
)
tuner.fit()
def trainable_demo(config):
time.sleep(3)
session.report({"f1": 0.8, "auc": 0.8})
if __name__ == '__main__':
hpo = RayHPO()
hpo.train()
From the picture, it can be seen that two trials is executed every 3 seconds.
b.py:
from a import RayHPO
if __name__ == '__main__':
hpo = RayHPO()
hpo.train()
From the picture, it can be seen that a trial is executed every 3 seconds.Trial is executed in sequence