RLlib sequencing for GTrXL

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi, I use your GTrXL implementation and notice that during inference the input shapes are of [batch_size, 1 (seq_len), feature_size].

But as far as i know, the whole point of these attention networks is that it has attention over a longer sequence length instead of 1. So it can evaluate and “look at” the whole sequence at once.

Why is it implemented as it is? If the sequence length would be days, we should not feed one day at a time but a couple of days together right?

Hope to hear from you soon!

Kind regards,
Dylan Prins

@sven1977

To add to this, if we want to obtain actions for an obs we should not feed only one timestep but obs from various timesteps right? Which means that it uses all the obs through time (seq_len) to determine one action.

Or is this thinking incorrect? I would like to know how it could possibly still evaluate the whole sequence at once if this is not the case. As said i thought this is how these kind of attention networks are meant to be used. LSTM’s do it one timestep at a time and attention networks evaluate all timesteps at once.