RLlib sequencing for GTrXL

Hi, I use your GTrXL implementation and notice that during inference the input shapes are of [batch_size, 1 (seq_len), feature_size].

But as far as i know, the whole point of these attention networks is that it has attention over a longer sequence length instead of 1. So it can evaluate and “look at” the whole sequence at once.

Why is it implemented as it is? If the sequence length would be days, we should not feed one day at a time but a couple of days together right?

To add to this, if we want to obtain actions for an obs we should not feed only one timestep but obs from various timesteps right? Which means that it uses all the obs through time (seq_len) to determine one action.

Or is this thinking incorrect? I would like to know how it could possibly still evaluate the whole sequence at once if this is not the case. As said i thought this is how these kind of attention networks are meant to be used. LSTM’s do it one timestep at a time and attention networks evaluate all timesteps at once.