If the observation includes action masks, I think for padding all zeros obsevations for lstm is not expected. My code requires non-zeros for action masks.
But current rllib is using zeros to pad the input for lstm models. Related problem can be found here. I tried to replace it with non-zeros padding, but I could not find where to replace the zeros padding to lstm.
@sven1977 Could you help me where is the location for this function? Many thanks.
I am not sure that if checking the action mask is all zero or not for each data in the batched input_dict is good or not. Maybe there is a better solution?