Custom LSTM Model, how to define the SEQ_LEN

ColdFrenzy · April 3, 2022, 2:51pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Hello everyone,
I’m trying to create a custom actor-critic model with LSTM similar to this:

ray-project/ray/blob/4795048f1b3779658e8b0ffaa05b1eb61914bc60/rllib/examples/models/rnn_model.py

import numpy as np

from ray.rllib.models.modelv2 import ModelV2
from ray.rllib.models.preprocessors import get_preprocessor
from ray.rllib.models.tf.recurrent_net import RecurrentNetwork
from ray.rllib.models.torch.recurrent_net import RecurrentNetwork as TorchRNN
from ray.rllib.utils.annotations import override
from ray.rllib.utils.framework import try_import_tf, try_import_torch

tf1, tf, tfv = try_import_tf()
torch, nn = try_import_torch()


class RNNModel(RecurrentNetwork):
    """Example of using the Keras functional API to define a RNN model."""

    def __init__(self,
                 obs_space,
                 action_space,
                 num_outputs,

This file has been truncated. show original

But with the addition of 2 different LSTMs one for the actor and one for the critic as in

One thing that is not clear to me is how should I set the time length of the LSTM. It looks like rllib by default set it to 32 in my code, but actually in my configuration i never used that value:

            "use_lstm": False,
            "max_seq_len": 20,
            "lstm_cell_size": 256,
            "lstm_use_prev_action": False,
            "lstm_use_prev_reward": False,
            "_time_major": False,
            "custom_model_config": {
                "share_weights": False,
                "shared_fc_layers": ([128, 64, 32],),
                "fc_layers": ([], []),
                "cell_size": 30
            },
            "rollout_fragment_length": 20 if args.debug else 200,
            "train_batch_size": 400 if args.debug else 4000,
            "sgd_minibatch_size": 25 if args.debug else 256,
            "shuffle_sequences": True,
            "num_sgd_iter": 30,

Do you know how to control such parameter in a custom LSTM model?

mannyv · April 3, 2022, 3:54pm

Hi @ColdFrenzy,

Can you share how you determined the 32 value?

The time length will vary depending on if you are in the sample phase or the train phase.

In the sample phase the time dimension will be 1 because rllib generates actions for each step on at a time.

During the training phase, which your configuration the max size of the time dimension will be 20 based in your max_seq_len setting. That also serves as the truncation length for tbptt. It is likely you will have sequences that are shorter than 20 though. This will happen if your episode is shorter than 20 or if you are using the truncate_batch sample setting and it pauses in the middle of an episode because you have hit your rollout_fragment_length.

Hope that helps.

ColdFrenzy · April 3, 2022, 6:39pm

Hi @mannyv ,
thanks for the info. Actually, I inserted a breakpoint inside the forward call of my custom LSTM model and I saw the shape of the seq_len which is a torch tensor of size 32 with all ones. From what you are telling me, the values 1 are probably the time dimension since I’m in the sampling phase.
The strange thing then is that the batch_size is 32, but I don’t think I have set it to 32 in my config.
I tried to continue with debugging and at some point the seq_len becomes tensor ([8, 8, 8, 8], dtype = torch.int32).
P.S. just to give a little bit of context, in my environment i have three agents which observe a state space of dimension 18, 9 are the real observations and 9 are the action_mask observations.
So in the end I’d like to understand a little better how batch sizes and time dimensions are handled in general, not necessarily in the context of my environment.

EDIT:
By following the function calls, I’ve ended up here

github.com

ray-project/ray/blob/master/rllib/policy/policy.py#L892

      
        
                stats_fn (Optional[Callable[[Policy, SampleBatch], Dict[str,
                    TensorType]]]): An optional stats function to be called after
                    the loss.
            """
            # Signal Policy that currently we do not like to eager/jit trace
            # any function calls. This is to be able to track, which columns
            # in the dummy batch are accessed by the different function (e.g.
            # loss) such that we can then adjust our view requirements.
            self._no_tracing = True
            
            
sample_batch_size = max(self.batch_divisibility_req * 4, 32)
            self._dummy_batch = self._get_dummy_batch_from_view_requirements(
                sample_batch_size
            )
            self._lazy_tensor_dict(self._dummy_batch)
            actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
                self._dummy_batch, explore=False
            )
            for key, view_req in self.view_requirements.items():
                if key not in self._dummy_batch.accessed_keys:
                    view_req.used_for_compute_actions = False

where batch size is defined as 32 and consequently the seq_len is defined as

github.com

ray-project/ray/blob/master/rllib/policy/policy.py#L926

      
        
                    self.view_requirements[key] = ViewRequirement()
                self.view_requirements[key].used_for_compute_actions = True
            self._dummy_batch = self._get_dummy_batch_from_view_requirements(
                sample_batch_size
            )
            self._dummy_batch.set_get_interceptor(None)
            self.exploration.postprocess_trajectory(self, self._dummy_batch)
            postprocessed_batch = self.postprocess_trajectory(self._dummy_batch)
            seq_lens = None
            if state_outs:
                B = 4  # For RNNs, have B=4, T=[depends on sample_batch_size]
                i = 0
                while "state_in_{}".format(i) in postprocessed_batch:
                    postprocessed_batch["state_in_{}".format(i)] = postprocessed_batch[
                        "state_in_{}".format(i)
                    ][:B]
                    if "state_out_{}".format(i) in postprocessed_batch:
                        postprocessed_batch["state_out_{}".format(i)] = postprocessed_batch[
                            "state_out_{}".format(i)
                        ][:B]
                    i += 1

mannyv · April 4, 2022, 12:18am

@ColdFrenzy,

Follow my posts in this thread. They might have some of the info you are interested in.

That code is some initialization code There are about 3 calls to forward at the very beginning before training starts that are used to set up ViewRequirements.

Federica_Tonti · May 9, 2024, 7:42pm

Hi @ColdFrenzy i am experiencing the same problem and completely blocked my work, I am getting crazy can you maybe explain how did you solve the issue? Thanks!

Marko_Blanusa · June 10, 2024, 9:34pm

Me too, I’m getting the exact same problem with the batch being 32 by default when arriving in the forward method with the seq_lens tensor being full of 1s.

And I suspect this is causing the error :

File “C:\Users\marko\anaconda3\envs\rllib-torch\lib\site-packages\ray\rllib\evaluation\worker_set.py”, line 181, in init
raise e.args[0].args[2]
RuntimeError: shape ‘[5504, 1]’ is invalid for input of size 12288

I don’t know where this shape and input size are coming from. My environment outputs states of shape (96, 36) representing one sequence per step.
My entire dataset is batched with a size of 128.

If anyone has an idea that would be a big help.

Topic		Replies	Views
Custom LSTM model doesn't perform well RLlib	3	576	January 13, 2023
Compute_single_action randomly errors without changing input RLlib	0	243	October 16, 2023
Custom eval function error with custom RNN model RLlib	0	300	April 14, 2022
State shapes incorrect using custom model (TorchModelV2) (PPO) RLlib	2	429	July 15, 2021
Using custom neural network in RLlib RLlib	5	1205	December 22, 2022

Custom LSTM Model, how to define the SEQ_LEN

Related topics