Understanding state_batches in compute_actions

mannyv · July 24, 2021, 11:34am

I saw you liked my post on states in rllib.

In there I mentioned

“When you are using an rnn you only get the initial state of the sequence. The other states are generated internally by the rnn logic.”

Thisius likely the issue. To verify this, check the shape of [obs, seq_lens, and state_in_0[0]] and see how they compare. If this is the issue you will see that obs batch size is larger than seq_lens and state_in_0 is equal to seq_lens.

An option that might work is to use a different name for the trajectory view variable that tracks state_in that does not include state in the key. This might circumvent the special state handling when preparing the sample batch. I am just guessing here.

Topic		Replies	Views
[rllib] SampleBatch "state_in_0" dimension shorter than expected RLlib	5	1344	June 4, 2021
Custom Recurrent Network and TrajectoryView RLlib	3	321	February 24, 2021
Initialise loss from dummy batch method in policy.py RLlib	4	649	June 18, 2024
Why does a SampleBatch contain a different number of elements for the hidden states of the RNN than for the obs, actions, advantages...? RLlib	3	291	June 3, 2021
GRU hidden_state tensor batch dimension is incompatible with sample_batch RLlib	4	360	August 23, 2021

Understanding state_batches in compute_actions

Related topics