How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi,
I am attempting to do an HPO using RayTune on a Soft-Actor-Critic RL agent that is being trained on a custom environment. I am using ray version 2.40 and my python version is 3.11.
sac_config = (
SACConfig()
.environment(env = Env.custom_env,
env_config=env_config,
disable_env_checking=True)
.env_runners(num_env_runners=5,
num_gpus_per_env_runner=0,
num_cpus_per_env_runner=2)
.learners(num_learners=1,
num_gpus_per_learner=0.25
)
.framework("torch")
.training(
initial_alpha=0.2,
target_entropy='auto',
twin_q=True
)
)
# Define the search space
search_space = {
"gamma": tune.uniform(0.9, 0.999), # Discount factor
"actor_lr": tune.loguniform(1e-5, 1e-3), # Learning rate
"critic_lr": tune.loguniform(1e-5, 1e-3), # Learning rate
"train_batch_size": tune.choice([128, 256, 512]), # Batch size
"tau": tune.uniform(0.005, 0.05), # Soft update coefficient
}
search_alg = OptunaSearch(
metric="episode_reward_mean",
mode="max"
)
# ASHA scheduler for early stopping
asha_scheduler = ASHAScheduler(
metric="episode_reward_mean",
mode="max",
max_t=150,
grace_period=100,
reduction_factor=2,
)
tuner = tune.Tuner(
"SAC",
param_space={**sac_config.to_dict(), **search_space},
tune_config=TuneConfig(
num_samples=1, # Number of trials
search_alg=search_alg,
scheduler=asha_scheduler, # Use ASHA for early stopping
max_concurrent_trials=4,
)
)
When I run the above code I get the error :
ValueError: Trial returned a result which did not include the specified metric(s) `episode_reward_mean` that `AsyncHyperBandScheduler` expects
I looked at the result json and saw that there was no key which indicated anything that looked like a reward or “return”. I am guessing something changed in Ray 2.40.
The exact same code ran without a glitch when using Ray 2.10 (Although I did have to remove the env_runners and learners from the config as Ray 2.10 does not have learners and runners)
Hoping one of you guys can help me out with this.