How to using self-definded tensor as padding observations for LSTM/Attention models

sven1977 · May 19, 2021, 3:38pm

@mannyv , thanks for your suggestions! I agree, maybe we should store the boolean mask itself inside the sample batch. This would eliminate some (duplicate and repeated) code, I guess. Worth a try. On the other hand, the information is all completely there inside “seq_lens” and it’s really just doing a e.g. tf.boolean_mask(tf.sequence_mask(seq_lens, max_seq_len)). But yeah, I’d say we’ll do that. Would you like to do a PR to fix this @mannyv ?
Also great catch on MARWIL! We do say in the docs that it supports RNNs, but it’s not true (I’ll change that). The only off-policy RNN supporting algo afaik is currently R2D2. We can probably take some logic from it regarding burn-in and stuff.

Topic		Replies	Views
When are MARL replay buffers zero padded? RLlib	8	576	October 12, 2021
Issue creating custom action mask enviorment RLlib	14	2258	October 11, 2023
LSTM and Attention on Stateless CartPole RLlib	5	1708	February 20, 2022
[RLlib] Variable-length Observation Spaces without padding RLlib	7	2769	March 9, 2021
How to use LSTM or Attention Network action masking with nested dict action space? RLlib	0	271	August 24, 2023

How to using self-definded tensor as padding observations for LSTM/Attention models

Related topics