Understanding state_batches in compute_actions

Hi @Lars_Simon_Zehnder,

I saw you liked my post on states in rllib.

In there I mentioned

“When you are using an rnn you only get the initial state of the sequence. The other states are generated internally by the rnn logic.”

Thisius likely the issue. To verify this, check the shape of [obs, seq_lens, and state_in_0[0]] and see how they compare. If this is the issue you will see that obs batch size is larger than seq_lens and state_in_0 is equal to seq_lens.

An option that might work is to use a different name for the trajectory view variable that tracks state_in that does not include state in the key. This might circumvent the special state handling when preparing the sample batch. I am just guessing here.

1 Like