I am training a RL model and trying to tune hyperparameters using PBT. by following this example:
https://docs.ray.io/en/latest/tune/examples/pbt_ppo_example.html
It all works like a charm, but I can’t figure out how to tune my custom environment hyperparameters.
PBT scheduler hyperparam_mutations seems to accept only the algo hyperparameters (PPO in my case) but not the custom environment ones (see below, “base_line_degrees” is not sampled from range (0.1 , 2.0) , instead it always uses the default value (-1) set on the custom gym env init method :
register_env("custom_env", lambda config: custom_env(config))
env_config = {
"base_line_degrees": 1
}
ppo_config = (
PPOConfig()
.environment(env="custom_env", env_config=env_config)
)
hyperparam_mutations = {
"lambda" : lambda: random.uniform(0.9,1.0),
"clip_param" : lambda: random.uniform(0.01, 0.5),
"lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5],
"base_line_degrees": lambda: random.uniform(0.1,2.0)
}
pbt = PopulationBasedTraining(
time_attr="time_total_s",
perturbation_interval=120,
resample_probability=0.25,
hyperparam_mutations=hyperparam_mutations
)
analysis = tune.run(
"PPO",
metric = "episode_reward_mean",
mode = "max",
scheduler = pbt,
num_samples=1,
config = ppo_config.to_dict()
)
I’m clearly not doing it right. It doesn’t seems to me that I’m passing PBT any way to access the env_conf object, but I can’t see any example/documentation on how to do it.
Can custom gym hyperparams be trained with PBT? and if so, how should it be done?
Thank for your help : )