How to using self-definded tensor as padding observations for LSTM/Attention models

Shanchao_Yang · May 15, 2021, 8:55am

Hi

If the observation includes action masks, I think for padding all zeros obsevations for lstm is not expected. My code requires non-zeros for action masks.

But current rllib is using zeros to pad the input for lstm models. Related problem can be found here. I tried to replace it with non-zeros padding, but I could not find where to replace the zeros padding to lstm.

@sven1977 Could you help me where is the location for this function? Many thanks.

I am not sure that if checking the action mask is all zero or not for each data in the batched input_dict is good or not. Maybe there is a better solution?

mannyv · May 15, 2021, 12:59pm

Hi @Shanchao_Yang,

The method that rllib calls to do the padding is here:

github.com

ray-project/ray/blob/e973b726c2e4039bd5a35ecfffad5028a15cdfee/rllib/policy/rnn_sequencing.py#L32

    
      
          from ray.rllib.utils.typing import TensorType, ViewRequirementsDict
          from ray.util import log_once
          
          
tf1, tf, tfv = try_import_tf()
          torch, _ = try_import_torch()
          
          
logger = logging.getLogger(__name__)
          
          

          
@DeveloperAPI
          def pad_batch_to_sequences_of_same_size(
                  batch: SampleBatch,
                  max_seq_len: int,
                  shuffle: bool = False,
                  batch_divisibility_req: int = 1,
                  feature_keys: Optional[List[str]] = None,
                  view_requirements: Optional[ViewRequirementsDict] = None,
          ):
              """Applies padding to `batch` so it's choppable into same-size sequences.
          
          
    Shuffles `batch` (if desired), makes sure divisibility requirement is met,

The lines that actually pad with 0s is here:

github.com

ray-project/ray/blob/e973b726c2e4039bd5a35ecfffad5028a15cdfee/rllib/policy/rnn_sequencing.py#L267+272

    
      
          # Dynamically shrink max len as needed to optimize memory usage
          if dynamic_max:
              max_seq_len = max(seq_lens) + _extra_padding
          
          
feature_sequences = []
          for f in feature_columns:
              # Save unnecessary copy.
              if not isinstance(f, np.ndarray):
                  f = np.array(f)
              length = len(seq_lens) * max_seq_len
              if f.dtype == np.object or f.dtype.type is np.str_:
                  f_pad = [None] * length
              else:
                  # Make sure type doesn't change.
                  f_pad = np.zeros((length, ) + np.shape(f)[1:], dtype=f.dtype)
              seq_base = 0
              i = 0
              for len_ in seq_lens:
                  for seq_offset in range(len_):
                      f_pad[seq_base + seq_offset] = f[i]
                      i += 1

Shanchao_Yang · May 15, 2021, 1:30pm

Many thanks! RNN + action masking is not friendly if padding with zero

mannyv · May 15, 2021, 1:47pm

This is worth a longer discussion. I suppose it depends on your action distribution and its ability to handle zeros. As far as I remember these values are never passed into the environment so they are never used to produce real actions.They also are masked out in the losses, usually, there is still at least 1 algorthim that is not masking currently(marwil).

Shanchao_Yang · May 15, 2021, 2:05pm

Yes, zero padding is fine if it does not used in forward or calculating loss. Actually my policy model cannot handle all zeros input. My observation shape has variable-length, and I padded it, so I need to store a tensor to tell the real shape. Masking with zeros just cause problems when calling forward function.

mannyv · May 15, 2021, 3:52pm

The other thing you could do without having to change the underlying library is something like this in your forward function.

def forward(...)
   padded = input_batch["obs_flat" ].sum(axis=1) == 0 #axis might be wrong but I cannot check now
   #handle padded rows
   ....

mannyv · May 15, 2021, 6:13pm

@sven1977,

What do you think about having pad_batch_to_sequences_of_same_size add a mask to the sample_batch indicating which rows are real vs padded. There are already many places in the code base that have to reconstruct this mask in some way it would be better and reduce the bug surface to compute and store it once there.

Shanchao_Yang · May 16, 2021, 1:41am

Yes, this is a solution for torch, since we can handle the non-padded obs. But the variable-size observation is not friendly to tensorflow models.

sven1977 · May 19, 2021, 3:38pm

@mannyv , thanks for your suggestions! I agree, maybe we should store the boolean mask itself inside the sample batch. This would eliminate some (duplicate and repeated) code, I guess. Worth a try. On the other hand, the information is all completely there inside “seq_lens” and it’s really just doing a e.g. tf.boolean_mask(tf.sequence_mask(seq_lens, max_seq_len)). But yeah, I’d say we’ll do that. Would you like to do a PR to fix this @mannyv ?
Also great catch on MARWIL! We do say in the docs that it supports RNNs, but it’s not true (I’ll change that). The only off-policy RNN supporting algo afaik is currently R2D2. We can probably take some logic from it regarding burn-in and stuff.

Topic		Replies	Views
When are MARL replay buffers zero padded? RLlib	8	567	October 12, 2021
Issue creating custom action mask enviorment RLlib	14	2215	October 11, 2023
How to add LSTM layer to action masked example with New Api stack RLlib	1	17	July 14, 2025
LSTM and Attention on Stateless CartPole RLlib	5	1678	February 20, 2022
How to use LSTM or Attention Network action masking with nested dict action space? RLlib	0	264	August 24, 2023

How to using self-definded tensor as padding observations for LSTM/Attention models

Related topics