For something like the GTrXL implementation, how does the model receive input sequences (histories of input at previous times) to attend over? Are they already provided as part of the input dict, or does GTrXL save past inputs to re-use for Multi-Headed Attention?
My environment produces observations of Box (123,), yet the input dict receives observations of Box(32,123). Is this the batch size or sequence size? I set the
max_seq_len of my model to 50, so I’m not sure where the 32 is coming from, which is why I am confused about how RLlib passes sequences/histories to the model.