Multi algorithms in hieralchical training

hosokawa-taiji · March 28, 2021, 7:27am

Is it possible to use different algorithms for hierarchical training?( hierarchical_training.py)
For example, PPO for low level steps and DQN for high level steps.

eoakes · March 29, 2021, 2:23pm

@sven1977 please take a look at this question

hosokawa-taiji · April 10, 2021, 2:52pm

I did this.But it’s not worked.I could’nt divide in two by agnent_id.
How to get agent_id in my_train_fn?

def policy_mapping_fn(agent_id):
  if agent_id.startswith("low_level_"):
      return "low_level_policy"
  else:
      return "high_level_policy"

def my_train_fn(config):
  if agent_id.startswith("low_level_"):
    return PPOTrainer(config=config)
  else:
    return DQNTrainer(config=config)

stop = {
    "training_iteration": args.stop_iters,
    "timesteps_total": args.stop_timesteps,
}

if args.flat:
    results = tune.run(
        my_train_fn,
        stop=stop,
        config={
            "env": WindyMazeEnv,
            "num_workers": 0,
            "framework": "torch" if args.torch else "tf",
        },
    )
else:
    maze = WindyMazeEnv(None)
    config = {
        "env": HierarchicalWindyMazeEnv,
        "num_workers": 0,
        "entropy_coeff": 0.01,
        "multiagent": {
            "policies": {
                "high_level_policy": (None, maze.observation_space,
                                      Discrete(4), {
                                          "gamma": 0.9
                                      }),
                "low_level_policy": (None,
                                     Tuple([
                                         maze.observation_space,
                                         Discrete(4)
                                     ]), maze.action_space, {
                                         "gamma": 0.0
                                     }),
            },
            "policy_mapping_fn": policy_mapping_fn,
        },
        "framework": "torch" if args.torch else "tf",
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
    }
    results = tune.run(my_train_fn, stop=stop, config=config, verbose=1)

mannyv · April 11, 2021, 12:19am

@hosokawa-taiji,

"multiagent": {
            "policies": {
                "high_level_policy": (None, maze.observation_space,
                                      Discrete(4), {
                                          "gamma": 0.9
                                      }),
                "low_level_policy": (None,

You see those Nones as the first element of the tuples? It indicates that the policy should use the default trainer but you can specify an alternative trainer for each policy if you want.

hosokawa-taiji · April 15, 2021, 8:24am

Thank you for your advice!
It works fine!

Topic		Replies	Views
Policy mapping and agentIDs in hierachical env example RLlib	0	388	August 13, 2021
Multi-Agent Training with Different Algorithms RLlib	24	3422	October 11, 2022
Hierachical multi-agent RL RLlib	1	502	February 15, 2023
Ensemble Learner with rule-based policies RLlib	1	350	January 12, 2022
Multiple hierarchical agents possible? RLlib	2	571	August 11, 2021

Multi algorithms in hieralchical training

Related topics