Thanks for the timely reply
However, after trying setting TUNE_DISABLE_STRICT_METRIC_CHECKING=1
and running a normal
tune.run(
trail,
metric="eval/return",
mode="max",
search_alg=OptunaSearch(),
scheduler=AsyncHyperBandScheduler(
max_t=TOTAL_EPOCH, grace_period=math.floor(TOTAL_EPOCH/2)
),
resources_per_trial={"cpu": 1, "gpu": 1 / 8},
max_concurrent_trials=1,
config=params,
num_samples=60,
verbose=1,
progress_reporter=MyReporter(args.model + datetime.now().strftime("%m-%d:%H:%M:%S:%s")),
sync_config=tune.SyncConfig(syncer=None),
local_dir=path.abspath("./ray"),
)
a crash would happen as
Traceback (most recent call last):
File "/home/bkk/openai-gym/working_algms/transformers/tune.py", line 143, in <module>
tune.run(
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/tune.py", line 718, in run
runner.step()
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 778, in step
self._wait_and_handle_event(next_trial)
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 757, in _wait_and_handle_event
raise TuneError(traceback.format_exc())
ray.tune.error.TuneError: Traceback (most recent call last):
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 745, in _wait_and_handle_event
self._on_training_result(
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 870, in _on_training_result
self._process_trial_results(trial, result)
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 954, in _process_trial_results
decision = self._process_trial_result(trial, result)
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 1003, in _process_trial_result
self._search_alg.on_trial_result(trial.trial_id, flat_result)
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/suggest/search_generator.py", line 135, in on_trial_result
self.searcher.on_trial_result(trial_id, result)
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/suggest/suggestion.py", line 549, in on_trial_result
self.searcher.on_trial_result(trial_id, result)
File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/suggest/optuna.py", line 483, in on_trial_result
metric = result[self.metric]
KeyError: 'eval/return'
This is just what I worry about:
AlgmSearcher
fails to extract a metric
from tune.report
and crash
Is this just a single case with OptunaSearch()
Do you know any other AlgmSearcher
(or Schduler
?) would work fine under this condition?