How to train hierarchical policies in hindsight?

hkbharath · March 1, 2024, 7:18pm

I am training a hierarchical agent with one HIGH_LEVEL policy and multiple LOW_LEVEL_* policies. Initially, I intend to train only HIGH_LEVEL policy while executing low-level control based on a predefined logic. Later, I want to train LOW_LEVEL_* policies using the previously trained HIGH_LEVEL policy. This method is known as training in hindsight. It can be useful for training hierarchical agents for complex tasks. So my question is, is it possible to train only the HIGH_LEVEL policy first and then train the LOW_LEVEL_* policies by restoring the trained HIGH_LEVEL policy? If yes, What is the correct way to train LOW_LEVEL_* policies using the trained HIGH_LEVEL policy?

Here is an example code that I tried to train the hierarchical policies in hindsight.

To train the HIGH_LEVEL policy,

algo_hl = a2c.A2CConfig().
.... # other configurations
 .multi_agent(
                policies=[HIGH_LEVEL, LOW_LEVEL-A0, LOW_LEVEL-A1, LOW_LEVEL-B0, LOW_LEVEL-B1, LOW_LEVEL-C0, LOW_LEVEL-C1],
                policies_to_train=[HIGH_LEVEL],
                policy_mapping_fn=get_mapping(),
            )

algo_hl.train()

Next, to train LOW_LEVEL_* policies,

algo_ll = Algorithm.from_checkpoint(
                checkpoint=checkpoint,
                policies_to_train=[LOW_LEVEL-A0, LOW_LEVEL-A1, LOW_LEVEL-B0, LOW_LEVEL-B1, LOW_LEVEL-C0, LOW_LEVEL-C1],
            )
algo_ll.train()

The algo_ll.train() is retraining the HIGH_LEVEL policy itself!

hkbharath · March 5, 2024, 2:44pm

While training the LOW_LEVEL_* policies, changes were required in the environment configurations. After following the solution from https://github.com/ray-project/ray/issues/9012 I was able to load the trained weights and start the training with new configurations for the environment.

Topic		Replies	Views
Multi algorithms in hieralchical training Ray Tune	4	521	April 15, 2021
Evaluating a Trained Model in Hierarchical Reinforcement Learning Checkpointing, Restoring	0	8	February 14, 2025
Ensemble Learner with rule-based policies RLlib	1	350	January 12, 2022
Hierachical multi-agent RL RLlib	1	502	February 15, 2023
Policy mapping and agentIDs in hierachical env example RLlib	0	388	August 13, 2021

How to train hierarchical policies in hindsight?

Related topics