Finetuning MBMPO policy

Nehal_Soni · September 13, 2022, 1:07am

I am trying to finetune the MBMPO policy using PPO.

Restoring MBMPO checkpoint directly to PPO (ppo_agent.restore(checkpoint_path)) is not working and gives below error due to a different state of the optimizer:

ValueError: loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group

To avoid this issue, I take weights of MBMPO policy and set it in ppo agent using set_weights. Though that is giving me below error:

RuntimeError: Error(s) in loading state_dict for FullyConnectedNetwork:
Missing key(s) in state_dict: “_value_branch_separate.0._model.0.weight”, “_value_branch_separate.0._model.0.bias”, “_value_branch_separate.1._model.0.weight”, “_value_branch_separate.1._model.0.bias”.

The weights dict of MBMPO policy have these keys: “_logits._model.0.weight”, “_logits._model.0.bias”, “_hidden_layers.0._model.0.weight”, “_hidden_layers.0._model.0.bias”, “_hidden_layers.1._model.0.weight”, “_hidden_layers.1._model.0.bias”.

Question: How to fine-tune MBMPO policy?

@sven1977 @michaelzhiluo1

arturn · September 13, 2022, 9:53am

Hi @Nehal_Soni ,

If you want to use a PPO policy in MBPO, you will have to make sure that the weights dict exactly matches the networks you are trying to restore. This will not be possible without some digging and handcrafting and is not supported by our public APIs.

Instantiate an MBPO Algorithm on your environment with the rest of the settings mirroring your PPO settings when possible
Extract the config vom the Algorithm via algorithm.get_policy(DEFAULT_POLICY_ID)
Have a good look at the models contained in the policy, print them
Do the same for PPO and compare models - you will have to make them look the same in order for anything going forward to make sense
Write your own policy classes, inheriting from the MBPO and PPO policy of the framework of your choice
Modify their get_weights and set_weights methods such that the names of all variables match upon restoration

Cheers

Nehal_Soni · September 13, 2022, 7:31pm

Thank you @arturn for your quick response, it’s helpful.

I understand that MAML policy needs to be fine-tuned and it is possible directly using PPO algorithm of RLlib (This thread mentions it and it has been tested also: MAML finetune adaptation step for inference).

Is there any better approach in RLlib to fine-tune MBMPO policy?

Regards

arturn · September 14, 2022, 8:44am

Hi @Nehal_Soni ,

Not that I know of. Our current implementations of policies and models make this process very cumbersome. But @kourosh is redesigning the policy and model APIs and together with connectors, you will probably see much change that will better support your use case.

The thread you mention tells us that it’s possible with MAML, but not that the steps are different. You still have to create a perfect match between model parameters and then manually reconstruct the model - no way around that atm.

Cheers

Topic		Replies	Views
ValueError when restoring checkpoint with PPO RLlib	1	500	October 20, 2022
Tuning fcnet_hiddens with RLlib PPO ValueError: loaded state dict RLlib	2	927	October 20, 2022
Loading pre-trained BC policy weight for tunning with hyper-parameter optimization Checkpointing, Restoring	1	25	August 28, 2024
Another tune after restoring a PPO algorithm Checkpointing, Restoring	2	277	December 15, 2023
PPO - Load checkpoint from previous version fails RLlib	2	870	March 17, 2022

Finetuning MBMPO policy

Related topics