How to train hierarchical policies in hindsight?

I am training a hierarchical agent with one HIGH_LEVEL policy and multiple LOW_LEVEL_* policies. Initially, I intend to train only HIGH_LEVEL policy while executing low-level control based on a predefined logic. Later, I want to train LOW_LEVEL_* policies using the previously trained HIGH_LEVEL policy. This method is known as training in hindsight. It can be useful for training hierarchical agents for complex tasks. So my question is, is it possible to train only the HIGH_LEVEL policy first and then train the LOW_LEVEL_* policies by restoring the trained HIGH_LEVEL policy? If yes, What is the correct way to train LOW_LEVEL_* policies using the trained HIGH_LEVEL policy?

Here is an example code that I tried to train the hierarchical policies in hindsight.

To train the HIGH_LEVEL policy,

algo_hl = a2c.A2CConfig().
.... # other configurations
 .multi_agent(
                policies=[HIGH_LEVEL, LOW_LEVEL-A0, LOW_LEVEL-A1, LOW_LEVEL-B0, LOW_LEVEL-B1, LOW_LEVEL-C0, LOW_LEVEL-C1],
                policies_to_train=[HIGH_LEVEL],
                policy_mapping_fn=get_mapping(),
            )

algo_hl.train()

Next, to train LOW_LEVEL_* policies,

algo_ll = Algorithm.from_checkpoint(
                checkpoint=checkpoint,
                policies_to_train=[LOW_LEVEL-A0, LOW_LEVEL-A1, LOW_LEVEL-B0, LOW_LEVEL-B1, LOW_LEVEL-C0, LOW_LEVEL-C1],
            )
algo_ll.train()

The algo_ll.train() is retraining the HIGH_LEVEL policy itself!

While training the LOW_LEVEL_* policies, changes were required in the environment configurations. After following the solution from https://github.com/ray-project/ray/issues/9012 I was able to load the trained weights and start the training with new configurations for the environment.