Custom LSTM Model, how to define the SEQ_LEN

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Hello everyone,
I’m trying to create a custom actor-critic model with LSTM similar to this:

But with the addition of 2 different LSTMs one for the actor and one for the critic as in

One thing that is not clear to me is how should I set the time length of the LSTM. It looks like rllib by default set it to 32 in my code, but actually in my configuration i never used that value:

            "use_lstm": False,
            "max_seq_len": 20,
            "lstm_cell_size": 256,
            "lstm_use_prev_action": False,
            "lstm_use_prev_reward": False,
            "_time_major": False,
            "custom_model_config": {
                "share_weights": False,
                "shared_fc_layers": ([128, 64, 32],),
                "fc_layers": ([], []),
                "cell_size": 30
            "rollout_fragment_length": 20 if args.debug else 200,
            "train_batch_size": 400 if args.debug else 4000,
            "sgd_minibatch_size": 25 if args.debug else 256,
            "shuffle_sequences": True,
            "num_sgd_iter": 30,

Do you know how to control such parameter in a custom LSTM model?

Hi @ColdFrenzy,

Can you share how you determined the 32 value?

The time length will vary depending on if you are in the sample phase or the train phase.

In the sample phase the time dimension will be 1 because rllib generates actions for each step on at a time.

During the training phase, which your configuration the max size of the time dimension will be 20 based in your max_seq_len setting. That also serves as the truncation length for tbptt. It is likely you will have sequences that are shorter than 20 though. This will happen if your episode is shorter than 20 or if you are using the truncate_batch sample setting and it pauses in the middle of an episode because you have hit your rollout_fragment_length.

Hope that helps.

1 Like

Hi @mannyv ,
thanks for the info. Actually, I inserted a breakpoint inside the forward call of my custom LSTM model and I saw the shape of the seq_len which is a torch tensor of size 32 with all ones. From what you are telling me, the values 1 are probably the time dimension since I’m in the sampling phase.
The strange thing then is that the batch_size is 32, but I don’t think I have set it to 32 in my config.
I tried to continue with debugging and at some point the seq_len becomes tensor ([8, 8, 8, 8], dtype = torch.int32).
P.S. just to give a little bit of context, in my environment i have three agents which observe a state space of dimension 18, 9 are the real observations and 9 are the action_mask observations.
So in the end I’d like to understand a little better how batch sizes and time dimensions are handled in general, not necessarily in the context of my environment.

By following the function calls, I’ve ended up here

where batch size is defined as 32 and consequently the seq_len is defined as


Follow my posts in this thread. They might have some of the info you are interested in.

That code is some initialization code There are about 3 calls to forward at the very beginning before training starts that are used to set up ViewRequirements.

1 Like