I am training a hierarchical agent with one HIGH_LEVEL
policy and multiple LOW_LEVEL_*
policies. Initially, I intend to train only HIGH_LEVEL
policy while executing low-level control based on a predefined logic. Later, I want to train LOW_LEVEL_*
policies using the previously trained HIGH_LEVEL
policy. This method is known as training in hindsight. It can be useful for training hierarchical agents for complex tasks. So my question is, is it possible to train only the HIGH_LEVEL
policy first and then train the LOW_LEVEL_*
policies by restoring the trained HIGH_LEVEL
policy? If yes, What is the correct way to train LOW_LEVEL_*
policies using the trained HIGH_LEVEL
policy?
Here is an example code that I tried to train the hierarchical policies in hindsight.
To train the HIGH_LEVEL
policy,
algo_hl = a2c.A2CConfig().
.... # other configurations
.multi_agent(
policies=[HIGH_LEVEL, LOW_LEVEL-A0, LOW_LEVEL-A1, LOW_LEVEL-B0, LOW_LEVEL-B1, LOW_LEVEL-C0, LOW_LEVEL-C1],
policies_to_train=[HIGH_LEVEL],
policy_mapping_fn=get_mapping(),
)
algo_hl.train()
Next, to train LOW_LEVEL_*
policies,
algo_ll = Algorithm.from_checkpoint(
checkpoint=checkpoint,
policies_to_train=[LOW_LEVEL-A0, LOW_LEVEL-A1, LOW_LEVEL-B0, LOW_LEVEL-B1, LOW_LEVEL-C0, LOW_LEVEL-C1],
)
algo_ll.train()
The algo_ll.train()
is retraining the HIGH_LEVEL
policy itself!