Is tune.report without metric somthing to avoid?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi All

Thanks for your attention and please correct me for any inconvenience as this is my first post

I just came across ray.tune serveral days ago and really love it

My question is: in face of error

ValueError: Trial returned a result which did not include the specified metric(s) “eval_return” that tune.run() expects. Make sure your calls to tune.report() include the metric, or set the TUNE_DISABLE_STRICT_METRIC_CHECKING environment variable to 1.

if I set TUNE_DISABLE_STRICT_METRIC_CHECKING = 1, would it hinders ray.tune to choose a better params for me?

I really love to use tune.report to report some other metrics into tensorboard more often than I could get a metric value

But am worrying about setting this environment variable would make something broken

Thansk a lot in advance

Hmm, it’s there more as a suggestion. It’s to safeguard against you forgetting to return that metric at some point. It’s safe to turn off if you know what you’re doing :slight_smile:

Thanks rliaw

does this mean that any param searcher / trail scheduler would also work well even if I don’t report metric in certain tune.report call?

Anyway I’m gonna give it a try and see if anything is broken/malfunctional :smiley:

1 Like

It should work well as long as you report it at some point :slight_smile:

Let me know how it goes!

Thanks for the timely reply

However, after trying setting TUNE_DISABLE_STRICT_METRIC_CHECKING=1
and running a normal

tune.run(
    trail,
    metric="eval/return",
    mode="max",
    search_alg=OptunaSearch(),
    scheduler=AsyncHyperBandScheduler(
        max_t=TOTAL_EPOCH, grace_period=math.floor(TOTAL_EPOCH/2)
    ),
    resources_per_trial={"cpu": 1, "gpu": 1 / 8},
    max_concurrent_trials=1,
    config=params,
    num_samples=60,
    verbose=1,
    progress_reporter=MyReporter(args.model + datetime.now().strftime("%m-%d:%H:%M:%S:%s")),
    sync_config=tune.SyncConfig(syncer=None),
    local_dir=path.abspath("./ray"),
)

a crash would happen as

Traceback (most recent call last):
  File "/home/bkk/openai-gym/working_algms/transformers/tune.py", line 143, in <module>
    tune.run(
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/tune.py", line 718, in run
    runner.step()
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 778, in step
    self._wait_and_handle_event(next_trial)
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 757, in _wait_and_handle_event
    raise TuneError(traceback.format_exc())
ray.tune.error.TuneError: Traceback (most recent call last):
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 745, in _wait_and_handle_event
    self._on_training_result(
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 870, in _on_training_result
    self._process_trial_results(trial, result)
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 954, in _process_trial_results
    decision = self._process_trial_result(trial, result)
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 1003, in _process_trial_result
    self._search_alg.on_trial_result(trial.trial_id, flat_result)
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/suggest/search_generator.py", line 135, in on_trial_result
    self.searcher.on_trial_result(trial_id, result)
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/suggest/suggestion.py", line 549, in on_trial_result
    self.searcher.on_trial_result(trial_id, result)
  File "/home/bkk/openai-gym/venv/lib/python3.9/site-packages/ray/tune/suggest/optuna.py", line 483, in on_trial_result
    metric = result[self.metric]
KeyError: 'eval/return'

This is just what I worry about:
AlgmSearcher fails to extract a metric from tune.report and crash

Is this just a single case with OptunaSearch()
Do you know any other AlgmSearcher (or Schduler?) would work fine under this condition?

Ah got it.
I am thinking could you modify optuna.py’s on_trial_result to skip if self.metric is not in result?

I think it should work.