Pre-train a model with baseline policy

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Hi!

I trained a DQN Algorithm to obtain a optimal policy to control actions in a custom environment. For the evaluation I compared the policy with a conventional rule-based policy. I did not archive better results with the policy learned. For that reason I would like to take adventage of the rule-based policy and pre-train the model with it.

For that I was exploring the documentation and saw two options to reach my objetive.

The first one is working with offline data. Especifically with converting the external experiences to batch format, where I must to write a similar code of the example for my application. But my environment can be simulated, so I explored other options.

The second one is to implement a custom exploration model and use to train the DQN/PG/Other Algorithm first and obtain in this way the pre-trained model. Regardles, I didn’t find documentation of how to implement a custom exploration model in DQN.

Someone have experience with this task of pre-train a model with a rule-based policy? Any recommendations or examples of how could I implement one of the two methods?

Thank you so much! Regards