How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am trying to migrate my project from 1.11 to 2.40, mainly due to numerical instability issues I was able to narrow down were due to the algorithms in rllib. Migrating to the new api stack has been a bit of a challenge, but I can finally get things to run after a few days of trying, and I have not noticed the numerical instability issues with the 2.40 yet, which is a good sign. However, my full training routine involves performing HPO with PBT.
I define the PBT object as follows:
pbt = PopulationBasedTraining(
time_attr="training_iteration",
perturbation_interval=500,
metric="evaluation/episode_reward_mean",
mode="max",
hyperparam_mutations={ */ hyperparam ranges and lists */ },
custom_explore_fn=postprocess_func,
)
Here is my tune config:
algo_config = (
PPOConfig()
.environment(env = ENV_NAME, env_config = env_config)
.evaluation(evaluation_config=eval_env_config, evaluation_num_env_runners=4, evaluation_interval=500, evaluation_duration=500, evaluation_duration_unit="episodes")
.env_runners(num_env_runners=5)
.framework("torch")
.rollouts(
rollout_fragment_length=200
)
.training(train_batch_size=2000, num_sgd_iter=5)
)
tuner = tune.Tuner(
"PPO",
param_space=algo_config.to_dict(),
run_config=air.RunConfig(
name="experiment_4",
storage_path=f"file://{os.getcwd()}/experiments/initial_testing",
stop={"training_iteration":10000}
),
tune_config=tune.TuneConfig(num_samples=6, scheduler=pbt)
)
tuner.fit()
I specifically want the mutation decisions to be based on the evaluation reward means, and this used to work perfectly fine as far as I can tell, on 1.11. However, on 2.40, I am getting a strange error with the following signature:
File "/home/{}/miniconda3/envs/rluav2/lib/python3.9/site-packages/ray/tune/execution/tune_controller.py", line 1678, in _validate_result_metrics
raise ValueError(
ValueError: Trial returned a result which did not include the specified metric(s) `evaluation/episode_reward_mean` that `PopulationBasedTraining` expects. Make sure your calls to `tune.report()` include the metric, or set the TUNE_DISABLE_STRICT_METRIC_CHECKING environment variable to 1. Result: {'num_training_step_calls_per_iteration': 1 ....
and it goes on. So evidently, the metric evaluation/episode_reward_mean
does not exist within this Result object. I already tried the following:
-
I tried to set the metric as
episode_reward_mean
(without evaluation), which is what the example given in the docs use. -
On tensorboard, after the training commences (when I try without setting the scheduler as pbt) I can see the card labeled “ray/tune/env_runners/episode_return_mean”, so I have tried setting the metric to be
env_runners/episode_return_mean
,episode_return_mean
,ray/tune/env_runners/episode_return_mean
and all of these give the same error, as none of these are in the Result object returned initially. -
I can get rid of the error by setting the metric as something present in it, like
env_runners/sample
, but this does not help me at all. -
I also tried setting
os.environ["TUNE_DISABLE_STRICT_METRIC_CHECKING"] = "1"
within the script which seems to be suggested by the initial error message, but I get a slightly different error message with essentially the same content:
raise RuntimeError(
RuntimeError: Cannot find metric env_runnes/episode_return_mean in trial result {'num_training_step_calls_per_iteration': 1, ...
With 1.11, I also used to pass the option "always_attach_evaluation_results": True,
to the config dictionary, but when I pass this inside evaluation options in algo_config
, I get an error:
raise ValueError(msg)
ValueError: `AlgorithmConfig.evaluation(always_attach_evaluation_results=..)` has been deprecated. This setting is no longer needed, b/c Tune does not error anymore (only warns) when a metrics key can't be found in the results.
and it does not proceed.
In summary: I want to perform HPO with PBT based on the evaluation episode reward mean metric, but I cannot seem to proceed to training with that or any useful metric.
Any help would be appreciated.