Custom Recurrent Network and TrajectoryView

It’s quite unclear how recurrent models function with the new trajectory view API.

    def forward_rnn(
        self, inputs: TensorType, state: List[TensorType], seq_lens: TensorType
        print('got state', state)
        state = [torch.tensor([1,2,3])]
        return logits, state

    def get_initial_state(self):
        return []
got state []
got state []
got state []
got state []

I want to build an accumulating representation that scales with the number of episodic timesteps, so various entries in a batch could have different state shapes.

Would anyone be able to provide some guidance on how to do this, or any insight on the interplay between get_initial_state, forward_rnn and TrajectoryView?

If my model inherits from TorchModelV2 instead of RecurrentNetwork, can I still propagate state?

@sven1977 do you know the answer to this?

1 Like

I’ve found that if the state shape returned from forward does not match what is returned by get_initial_state, the state will be replaced with a call to get_initial_state, which is a bit confusing.

TrajectoryView simply selects which fields from the SampleBatch to load in forward, using modified slice notation: -5:0 returns the last five observations. If only a single observation has occurred, the view will zero-pad to match a size of 5 observations. There is no state field in a SampleBatch object, however there is code in the torch attention implementation that uses state_in and state_out. It’s not clear to me where these get populated.

I would like to do something similar to attention, where the state size varies based on the number of timesteps. Using either TrajectoryView or state/get_initial_state require you determine the state size ahead of time. This means if you have episodes of length 500, even at t=0 you will be loading large (500,n) matrices into memory, with 499 of the rows being zero.

I suspect something more efficient is going on with the gtrxl implementation, but I’m having trouble wrapping my head around it.

A flexible state tensor size based on the time step is currently not possible. The state inputs always have the same size and this size is statically defined vie the view_requirements during construction time.

The trajectory view API allows RLlib, however, to only store each timestep’s state as 1 x [state size] (as opposed to earlier versions of RLlib where we had to store - for each timestep - n x [state size], where n is the attention field size). What I’m saying is: I don’t think you lose a lot of performance because of that. Creating a (fixed size) input tensor from single timesteps in a list is relatively fast and the difference between creating a large tensor vs creating a smaller one is really marginal (I tested this myself when I wrote the new SampleCollector mechanism).

1 Like