How to using self-definded tensor as padding observations for LSTM/Attention models

@mannyv , thanks for your suggestions! I agree, maybe we should store the boolean mask itself inside the sample batch. This would eliminate some (duplicate and repeated) code, I guess. Worth a try. On the other hand, the information is all completely there inside “seq_lens” and it’s really just doing a e.g. tf.boolean_mask(tf.sequence_mask(seq_lens, max_seq_len)). But yeah, I’d say we’ll do that. Would you like to do a PR to fix this @mannyv ?
Also great catch on MARWIL! :slight_smile: We do say in the docs that it supports RNNs, but it’s not true (I’ll change that). The only off-policy RNN supporting algo afaik is currently R2D2. We can probably take some logic from it regarding burn-in and stuff.