Hyperparameter search with tune for multiagent environments

henry_lei · June 20, 2023, 10:46pm

I’m trying to do a hyperparameter search using tune for a custom multiagent environment. All the agents should share the same network architecture, but they are trained as individual policies. I want to iterate over LSTM hidden cell sizes, but when I specify a grid via tune.grid_search([x, x, x]) it is applied separately to each agent. How would I enforce the fact that all agents should share the same architecture?

Example:

exper_params = {"lstm_cell_size": tune.grid_search([32, 64])}


policy_map = {
    "policy_0": (
        None, obs_space_high, act_space_high,
        {"model": {"fcnet_hiddens": exper_params["policy_fcnet_hiddens"], "fcnet_activation": "tanh",
                   "use_lstm": True,
                   "lstm_cell_size": exper_params["lstm_cell_size"],
                   "max_seq_len": exper_params["policy_max_seq_length"]}}),
    "policy_1": (
            None, obs_space_high, act_space_high,
        {"model": {"fcnet_hiddens": exper_params["policy_fcnet_hiddens"], "fcnet_activation": "tanh",
                   "use_lstm": True,
                   "lstm_cell_size": exper_params["lstm_cell_size"],
                   "max_seq_len": exper_params["policy_max_seq_length"]}}),
    "policy_2": (
        None, obs_space_high, act_space_high,
        {"model": {"fcnet_hiddens": exper_params["policy_fcnet_hiddens"], "fcnet_activation": "tanh",
                   "use_lstm": True,
                   "lstm_cell_size": exper_params["lstm_cell_size"],
                   "max_seq_len": exper_params["policy_max_seq_length"]}}),
}

kai · June 21, 2023, 8:33am

Hi @henry_lei, I’m not exactly sure what you want to achieve.

The tune.grid_search defines values to search over. Each of the values is sampled exactly num_samples times.

Are you talking about keeping policy_fcnet_hiddens and policy_max_seq_length constant? If so, you can just pass a constant in the search space (exper_params). If you also want to search over these, but want to keep the samples values constant across runs, you can use the constant_grid_search parameters of the BasicVariantGenerator.

henry_lei · June 21, 2023, 5:13pm

I want to search over a couple of values of the network parameter “LSTM hidden cell size”, but I want each of the policies to share the value produced by the generator so that the network architecture for each of the agents is homogenous. Currently, as the config is written, it seems that the generator is being passed instead of the value it produces. For example, if I specify a grid_search generator in exper_params as grid_search([1, 2, 3]), it will produce trials like:

policy_1: 1, policy_2: 1, policy_3: 1
policy_1: 1, policy_2: 1, policy_3: 2
policy_1: 1, policy_2: 1, policy_3: 3
policy_1: 1, policy_2: 2, policy_3: 1 etc

for a total of 27 trials

whereas I would want
policy_1: 1, policy_2: 1, policy_3: 1,
policy_1: 2, policy_2: 2, policy_3: 2,
policy_1: 3, policy_2: 3, policy_3: 3,

for a total of 3 trials.

kai · June 23, 2023, 2:13pm

Can you post your full config and the code where you instantiate the Tuner and call tuner.fit() (or even more of the code)?

If you want to use the same LSTM size across policies, you should be able to just instantiate them with the same config parameter within the trainable. Since this is presumably related to Rllib, more context and code would be helpful.

henry_lei · June 23, 2023, 5:27pm

# Policy Selection Method
def select_policy(agent_id):
    if agent_id == "deputy_0":
        policyname = "policy_0"
    elif agent_id == "deputy_1":
        policyname = "policy_1"
    elif agent_id == "deputy_2":
        policyname = "policy_2"
    return policyname



# Training configs
exper_params = {"bufferCap": 100000, "burn_in": 20, "learn_rate": tune.grid_search([0.00005, 0.0001, 0.0005, 0.001]), "batch_size": tune.grid_search([128, 256, 512, 1028]),
        "discount_rate": tune.grid_search([0.75, 0.85, 0.95, 0.99]), "num_workers": 15, "policy_fcnet_hiddens": [64, 64], "lstm_cell_size": tune.grid_search([32, 64, 128, 256]),
        "policy_max_seq_length": 20, "timesteps_trained": 75000}

# Chief object and viewpoint params
chief_params = {"Point cloud": infoEnv.chief_object.ptCldName, "Number of points": infoEnv.chief_object.numPoints,
        "Diam of bounding box": infoEnv.chief_object.diam, "Projection diam": infoEnv.chief_object.sphereRadius,
        "Viewpoint dist from origin": infoEnv.chief_object.viewScale}

# Environment params
env_params = {"Rotation Mode": infoEnv.env_config["RotationMode"], "Number of viewpoints": infoEnv.num_inspection_points, "Agent Field of View": infoEnv.FOV,
        "Angular velocity scaling": infoEnv.SF, "POI reward": infoEnv.POI_reward, "Fuel penalty": infoEnv.fuel_penalty, "Inspection threshold": infoEnv.info_thresh,
        "Reward tranlsation": infoEnv.reward_translation}


# Configs`Preformatted text`
config = R2D2Config()
config.environment(env="HLInfoInspEnv", env_config=env_config)
replay_config = config.replay_buffer_config.update({"capacity": exper_params["bufferCap"], "replay_burn_in": exper_params["burn_in"]})
config.training(lr= exper_params["learn_rate"])
config.training(train_batch_size= exper_params["batch_size"])
config.training(gamma= exper_params["discount_rate"]) 
config.rollouts(num_rollout_workers = exper_params["num_workers"])
policy_map = {
    "policy_0": (
        None, obs_space_high, act_space_high,
        {"model": {"fcnet_hiddens": exper_params["policy_fcnet_hiddens"], "fcnet_activation": "tanh",
                   "use_lstm": True,
                   "lstm_cell_size": exper_params["lstm_cell_size"],
                   "max_seq_len": exper_params["policy_max_seq_length"]}}),
    "policy_1": (
            None, obs_space_high, act_space_high,
        {"model": {"fcnet_hiddens": exper_params["policy_fcnet_hiddens"], "fcnet_activation": "tanh",
                   "use_lstm": True,
                   "lstm_cell_size": exper_params["lstm_cell_size"],
                   "max_seq_len": exper_params["policy_max_seq_length"]}}),
    "policy_2": (
        None, obs_space_high, act_space_high,
        {"model": {"fcnet_hiddens": exper_params["policy_fcnet_hiddens"], "fcnet_activation": "tanh",
                   "use_lstm": True,
                   "lstm_cell_size": exper_params["lstm_cell_size"],
                   "max_seq_len": exper_params["policy_max_seq_length"]}}),
}
config.multi_agent(policies=policy_map, policy_mapping_fn=select_policy)

# Run configs
stop_dict = {'timesteps_total': exper_params["timesteps_trained"]}


# Train - saves experiment to an output folder.
tuner = tune.Tuner(
    "R2D2", 
    run_config=air.RunConfig(
        name = "experiment_output",
        stop=stop_dict, 
        local_dir = output_dir,
        sync_config=sync_config, 
        checkpoint_config=air.CheckpointConfig(
            checkpoint_score_attribute="episode_reward_mean",
            checkpoint_frequency=1, 
            num_to_keep=2,
            ),
        ), 
    param_space=config.to_dict())
results = tuner.fit()

kai · June 26, 2023, 1:51pm

Thanks! This is helpful.

The way you can do this is with tune.sample_from, which can access existing config keys.

This would look e.g. like this:

policy_map = {
    "policy_0": (
        None, 1, 10,
        {"model": {"fcnet_hiddens": exper_params["policy_fcnet_hiddens"], "fcnet_activation": "tanh",
                   "use_lstm": True,
                   "lstm_cell_size": tune.grid_search([32, 64, 128, 256]),
                   "max_seq_len": exper_params["policy_max_seq_length"]}}),
    "policy_1": (
            None, 1, 10,
        {"model": {"fcnet_hiddens": exper_params["policy_fcnet_hiddens"], "fcnet_activation": "tanh",
                   "use_lstm": True,
                   "lstm_cell_size": tune.sample_from(lambda config: config["policies"]["policy_0"][3]["model"]["lstm_cell_size"]),
                   "max_seq_len": exper_params["policy_max_seq_length"]}}),
    "policy_2": (
        None, 1, 10,
        {"model": {"fcnet_hiddens": exper_params["policy_fcnet_hiddens"], "fcnet_activation": "tanh",
                   "use_lstm": True,
                   "lstm_cell_size": tune.sample_from(lambda config: config["policies"]["policy_0"][3]["model"]["lstm_cell_size"]),
                   "max_seq_len": exper_params["policy_max_seq_length"]}}),
}

Notice how we use grid_search only once and otherwise refer to the existing parameter with tune.sample_from(lambda config: config["policies"]["policy_0"][3]["model"]["lstm_cell_size"])

Topic		Replies	Views
Multi-agent configuration incompatible with Ray hyperparam tuning RLlib	3	691	May 11, 2022
Use LSTM model for policy gradient multi-agent with different recurrent hidden states per agent Configure Algorithm, Training, Evaluation, Scaling	0	28	July 30, 2024
Tuning model hyperparameters with v2 API Ray Tune stopping condition & comparisons	1	53	October 4, 2024
Tune: Some hyper parameters format are not good for search algorithm Ray Tune	2	431	November 27, 2020
Evaluate a model after hyperparameters research algorithm Ray Tune	5	387	January 11, 2023

Hyperparameter search with tune for multiagent environments

Related topics