Custom LSTM Model for R2D2

Can anyone give me a hint on how to set up a custom network for the R2D2 algorithm that uses an LSTM? In principle I have something like observation-> linear1-> linear2-> LSTM-> Q values. The API is poorly documented to the maximum and incomprehensible. Especially since it also looks like I can’t just override forward_rnn, but also have to override the get_q_values function for some impossible (because undocumented) to guess reason. I’m trying to figure out this mess here but the whole thing is incomprehensible to the max:
Thanks in advance.