Tune custom gym env_config params with PBT

I am training a RL model and trying to tune hyperparameters using PBT. by following this example:

https://docs.ray.io/en/latest/tune/examples/pbt_ppo_example.html

It all works like a charm, but I can’t figure out how to tune my custom environment hyperparameters.
PBT scheduler hyperparam_mutations seems to accept only the algo hyperparameters (PPO in my case) but not the custom environment ones (see below, “base_line_degrees” is not sampled from range (0.1 , 2.0) , instead it always uses the default value (-1) set on the custom gym env init method :

register_env("custom_env", lambda config: custom_env(config))

env_config = {
    "base_line_degrees": 1
}

ppo_config = (
    PPOConfig()
    .environment(env="custom_env", env_config=env_config)
)

hyperparam_mutations = {
    "lambda" : lambda: random.uniform(0.9,1.0),
    "clip_param" : lambda: random.uniform(0.01, 0.5),
    "lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5],
    "base_line_degrees": lambda: random.uniform(0.1,2.0)
}

pbt = PopulationBasedTraining(
    time_attr="time_total_s",
    perturbation_interval=120,
    resample_probability=0.25,
    hyperparam_mutations=hyperparam_mutations
)

analysis = tune.run(
    "PPO", 
    metric = "episode_reward_mean", 
    mode = "max",
    scheduler = pbt,
    num_samples=1,
    config = ppo_config.to_dict()
)

I’m clearly not doing it right. It doesn’t seems to me that I’m passing PBT any way to access the env_conf object, but I can’t see any example/documentation on how to do it.

Can custom gym hyperparams be trained with PBT? and if so, how should it be done?

Thank for your help : )

@PREJAN did you ever find the solution? I’m facing the same obstacle.

Hi @PREJAN and @ihopethiswillfi,

apologies for the late reply here.

You can specify nested dicts in the hyperparameter mutations, like this:

pbt = PopulationBasedTraining(
    ...
    hyperparam_mutations={
        "lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5],
        "env_config": {
            "my_param": lambda: random.uniform(1.10, 1.50)
        }
    },
)

Just for reference, here’s a full running toy example with a custom cartpole env:


import random

from ray import train, tune
from ray.rllib.algorithms.ppo import PPO
from ray.tune.schedulers import PopulationBasedTraining

from gymnasium.envs.classic_control.cartpole import CartPoleEnv


class MyCartPoleEnv(CartPoleEnv):
    def __init__(self, my_param):
        print("MY PARAM", my_param)
        super().__init__()


pbt = PopulationBasedTraining(
    time_attr="time_total_s",
    perturbation_interval=4,
    resample_probability=1,
    # Specifies the mutations of these hyperparams
    hyperparam_mutations={
        "lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5],
        "env_config": {
            "my_param": lambda: random.uniform(1.10, 1.50)
        }
    },
)

tune.register_env("MyCartPole", MyCartPoleEnv)

tuner = tune.Tuner(
    PPO,
    run_config=train.RunConfig(
        name="pbt_cartpole",
    ),
    tune_config=tune.TuneConfig(
        scheduler=pbt,
        num_samples=8,
        metric="episode_reward_mean",
        mode="max",
    ),
    param_space={
        "env": "MyCartPole",
        "lr": 0.01,
        "env_config": {
            "my_param": 1.3
        }
    },
)
results = tuner.fit()

print("best hyperparameters: ", results.get_best_result().config)
1 Like