I have seen the example of how to create PPO with a centralized critic and it’s been useful, thanks!
I’m trying to adapt this codebase using Torch to use an LSTM or GRU instead of just a feed-forward network and I’ve been unsuccessful at that.
Is there any such example or hint how to do use an LSTM as a central critic? (preferably with PPO)
Specifically I’m having trouble understanding how to properly set seq_lens
but maybe I’ve just gone down the wrong rabbit hole and there are easier ways to do that.
Using RLLib 1.6
Thanks!