Dear RLLib team,
Thanks for your great work in RLLib, we all much enjoy it! However, we met a problem in how to realize this feature, we appreciate it if can help to advise us of your suggested solution in RLLib.
This problem requires one more layer of transfer learning on top of another trained policy. The trained policy should be frozen during training and the new layer will be trained.
Our Naive Solution:
Our plan is to use your custom_train_workflow related features. During custom training workflow, we select the part of the model’s parameters to freeze, and some other parts of parameters to train.
Thanks for your help let us know if our solution is correct and follows the rllib philosophy. Or you already have a much easier solution for that.