I am training a hierarchical agent with one HIGH_LEVEL policy and multiple LOW_LEVEL_* policies. Initially, I intend to train only HIGH_LEVEL policy while executing low-level control based on a predefined logic. Later, I want to train LOW_LEVEL_* policies using the previously trained HIGH_LEVEL policy. This method is known as training in hindsight. It can be useful for training hierarchical agents for complex tasks. So my question is, is it possible to train only the HIGH_LEVEL policy first and then train the LOW_LEVEL_* policies by restoring the trained HIGH_LEVEL policy? If yes, What is the correct way to train LOW_LEVEL_* policies using the trained HIGH_LEVEL policy?
Here is an example code that I tried to train the hierarchical policies in hindsight.
To train the HIGH_LEVEL policy,
algo_hl = a2c.A2CConfig().
.... # other configurations
.multi_agent(
policies=[HIGH_LEVEL, LOW_LEVEL-A0, LOW_LEVEL-A1, LOW_LEVEL-B0, LOW_LEVEL-B1, LOW_LEVEL-C0, LOW_LEVEL-C1],
policies_to_train=[HIGH_LEVEL],
policy_mapping_fn=get_mapping(),
)
algo_hl.train()
Next, to train LOW_LEVEL_* policies,
algo_ll = Algorithm.from_checkpoint(
checkpoint=checkpoint,
policies_to_train=[LOW_LEVEL-A0, LOW_LEVEL-A1, LOW_LEVEL-B0, LOW_LEVEL-B1, LOW_LEVEL-C0, LOW_LEVEL-C1],
)
algo_ll.train()
The algo_ll.train() is retraining the HIGH_LEVEL policy itself!