[rllib] Will the hidden state of an rnn policy be reset by default at the end of an episode?

I’m relatively new to ray/rllib and have been fiddling with this question for a while, which I haven’t been able to answer yet. My question: Is the hidden state of an rnn model derived from RecurrentNetwork, for example as described in the documentation automatically reset at the end of an episode? Is there then a new call of get_initial_state() or how are the initial values known then? In which module is this done? First of all, I would of course be interested to know if this is handled at all. I would be very grateful for any help, as this problem has been plaguing me for a while.

Hey @LukasNothhelfer , RLlib automatically resets the internal state at the beginning of an episode. Note that the internal state is not saved inside the model, but “carried” by the RolloutWorkers and its SampleCollectors. At the beginning of an episode, RLlib uses the initial state defined either by the model via its get_initial_state method, or by the model’s view-requirements dict.
In other words, you should be fine. :slight_smile: You can maybe print out the state tensors being passed into your forward passes to confirm they are all 0.0s (or whatever init value you defined) at the beginning of each episode?