Pre-train a model with baseline policy

hermmanhender · February 6, 2024, 10:57am

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Hi!

I trained a DQN Algorithm to obtain a optimal policy to control actions in a custom environment. For the evaluation I compared the policy with a conventional rule-based policy. I did not archive better results with the policy learned. For that reason I would like to take adventage of the rule-based policy and pre-train the model with it.

For that I was exploring the documentation and saw two options to reach my objetive.

The first one is working with offline data. Especifically with converting the external experiences to batch format, where I must to write a similar code of the example for my application. But my environment can be simulated, so I explored other options.

The second one is to implement a custom exploration model and use to train the DQN/PG/Other Algorithm first and obtain in this way the pre-trained model. Regardles, I didn’t find documentation of how to implement a custom exploration model in DQN.

Someone have experience with this task of pre-train a model with a rule-based policy? Any recommendations or examples of how could I implement one of the two methods?

Thank you so much! Regards

Topic		Replies	Views
Offline RL with DQN, PPO, etc Offline RL	0	318	November 5, 2023
Proper way to implement a custom Algorithm + Policy + Model RLlib	2	958	April 24, 2023
Custom sampling for dqn RLlib	1	220	June 13, 2023
Where to start learning model/policy customization? RLlib	1	499	November 10, 2021
How to use my pretrained model as policy and value netwok RLlib	6	1181	December 26, 2023

Pre-train a model with baseline policy

Related topics