I want to customize RLLib’s DQN in a way that it outputs n (let’s say 10) number of Q-values where each Q-value uses a different discount factor gamma that is also passed as an input argument. I am trying to implement an architecture from this paper which is shown on page 14 Figure 9. I have 2 questions:
- Can I can try to define a CustomModel class using this RLlib example code which could implement this architecture? Is this doable in a way that it does not mess up with the rest of RLlib (which I am trying to learn and am no expert). I want to use a TF model.
- What will happen to the RLib’s config [gamma] as I don’t want to implement a fixed gamma value, rather I want to pass a list of gammas (1 for each Q-value) when the neural network is created? I am not sure how the config [gamma] will behave in this case.
I am thankful to the Ray team and would appreciate any help/pointers in this direction. Thank you.