HI, I want to write my own credit assignment function instead of using discount factor, a lot of problems are not in the type of discount factor credit assignment which give more weight to last action. sometimes there is one action (for example in the middle of trajectory) that has more credit for gaining reward and we can write a customized function to assign credits. the question is:
1- how can I do this in Rllib? should I subclass postprocess?
2- do we have any example of doing so?
" in some basketball match we give more credit on an excellent pass (with some logics) against last move (maybe because its easy) while discount factor do opposite and gives more credit on last action.