Partial freeze and partial train

yiwc · November 14, 2021, 8:06am

Dear RLLib team,

Thanks for your great work in RLLib, we all much enjoy it! However, we met a problem in how to realize this feature, we appreciate it if can help to advise us of your suggested solution in RLLib.

Problem Description:
This problem requires one more layer of transfer learning on top of another trained policy. The trained policy should be frozen during training and the new layer will be trained.

Our Naive Solution:
Our plan is to use your custom_train_workflow related features. During custom training workflow, we select the part of the model’s parameters to freeze, and some other parts of parameters to train.

Thanks for your help let us know if our solution is correct and follows the rllib philosophy. Or you already have a much easier solution for that.

arturn · November 14, 2021, 11:02pm

Hi @yiwc ,

If you implement your model yourself, i.e. with the ModelV2 API , you can simply put a tf.stop_gradient() in your forward pass function.

Otherwise, you can update your policy with a new apply_gradients_fn that only applies the gradients to your one layer and leaves the other ones alone.

If you have questions on how to do this, I will be happy to answer them.

Cheers

mannyv · November 15, 2021, 1:58pm

Hi @yiwc,

If you are using torch, you could write a function to freeze the layers of interest by setting requires_grad=False on those layers parameters.

If you are using tf and keras you can set the layer.trainable=False

With either framework, could then create a new trainer that apply this function similar to the method described in this post:

yiwc · November 15, 2021, 2:42pm

Hi @arturn @mannyv ,

We appreciate your immediate reply!

Yes we will try apply_gradients_fn, see how far we can go.

Thanks again, and have a good day!

yiwc · November 21, 2021, 4:00am

Hi @arturn,

Thanks for your advice. Now we are trying to fine-tune a model from a loaded trained model. Where do you think we should put the load model code in?

we thought of a few possible solutions

put in the custom train execution plan, before train we load the pre-trained model first.

# some brief pseudocode just for idea
def execution_plan(xxx):
    policy.load(pretrained_model1,pretrained_model2)

we can also load the pre-trained model before the tune function starts.

# some brief pseudocode just for idea
my_trainer=trainer(xxx)
my_trainer.policy.load_models(pretrained_model_weights1,pretrained_model_weights2)

Appreciate your help and advice if these are recommended solutions~
Regards,

arturn · November 21, 2021, 2:47pm

Hi @yiwc ,

Glad we could help.
If you want to load a complete model of a previously trained policy, the easiest way is to call the restore method of your Trainer. From the docs:

agent = ppo.PPOTrainer(config=config, env=env_class)
agent.restore(checkpoint_path)

Does this work for you? There are of course other ways and more elaborate solutions.
I am sure @mannyv has more to offer

Cheers

Topic		Replies	Views
How to turn training off for hidden layers of default PPO network? RLlib	5	874	March 14, 2022
How to make an agent to learn some actions more(earlier) than the others RLlib	6	249	May 29, 2022
[RLlib] Exporting a PyTorch policy for TorchScript RLlib	4	939	February 8, 2021
Best practice for multi-stage training workflow RLlib	3	496	September 6, 2022
RLLib Computing random actions that don't match model output RLlib	0	180	November 15, 2023

Partial freeze and partial train

Related topics