How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
I trained a DQN Algorithm to obtain a optimal policy to control actions in a custom environment. For the evaluation I compared the policy with a conventional rule-based policy. I did not archive better results with the policy learned. For that reason I would like to take adventage of the rule-based policy and pre-train the model with it.
For that I was exploring the documentation and saw two options to reach my objetive.
The first one is working with offline data. Especifically with converting the external experiences to batch format, where I must to write a similar code of the example for my application. But my environment can be simulated, so I explored other options.
The second one is to implement a custom exploration model and use to train the DQN/PG/Other Algorithm first and obtain in this way the pre-trained model. Regardles, I didn’t find documentation of how to implement a custom exploration model in DQN.
Someone have experience with this task of pre-train a model with a rule-based policy? Any recommendations or examples of how could I implement one of the two methods?
Thank you so much! Regards