got state []
got state []
got state []
got state []
...
I want to build an accumulating representation that scales with the number of episodic timesteps, so various entries in a batch could have different state shapes.
Would anyone be able to provide some guidance on how to do this, or any insight on the interplay between get_initial_state, forward_rnn and TrajectoryView?
If my model inherits from TorchModelV2 instead of RecurrentNetwork, can I still propagate state?
I’ve found that if the state shape returned from forward does not match what is returned by get_initial_state, the state will be replaced with a call to get_initial_state, which is a bit confusing.
TrajectoryView simply selects which fields from the SampleBatch to load in forward, using modified slice notation: -5:0 returns the last five observations. If only a single observation has occurred, the view will zero-pad to match a size of 5 observations. There is no state field in a SampleBatch object, however there is code in the torch attention implementation that uses state_in and state_out. It’s not clear to me where these get populated.
I would like to do something similar to attention, where the state size varies based on the number of timesteps. Using either TrajectoryView or state/get_initial_state require you determine the state size ahead of time. This means if you have episodes of length 500, even at t=0 you will be loading large (500,n) matrices into memory, with 499 of the rows being zero.
I suspect something more efficient is going on with the gtrxl implementation, but I’m having trouble wrapping my head around it.
A flexible state tensor size based on the time step is currently not possible. The state inputs always have the same size and this size is statically defined vie the view_requirements during construction time.
The trajectory view API allows RLlib, however, to only store each timestep’s state as 1 x [state size] (as opposed to earlier versions of RLlib where we had to store - for each timestep - n x [state size], where n is the attention field size). What I’m saying is: I don’t think you lose a lot of performance because of that. Creating a (fixed size) input tensor from single timesteps in a list is relatively fast and the difference between creating a large tensor vs creating a smaller one is really marginal (I tested this myself when I wrote the new SampleCollector mechanism).