It would be great to have a more complete description of how RLLib handles the observation pipeline to feed LSTM/RNN models, and is there any difference between the provided option and a custom implementation. Particularly how it changes what a typical batch looks like, how does max_seq_len
value impact the pipeline, and should it change your choice of batch mode (“complete_episodes”??).
2 Likes