Fcnet_hiddens and lstm settings

zzchuman · June 20, 2021, 7:45am

I want to customize the neural network structure of the reinforcement learning agent.
And I found that the model config can be defined directly.
But I have a question, whether “use_lstm” and “fcnet_hiddens” can be used at the same time.
If I set the PPO agent model as this:
“model”:{
“fcnet_hiddens”: [256, 128],
“fcnet_activation”: “tanh”,

        "use_lstm": True,
        "max_seq_len": 20,
        "lstm_cell_size": 256,
    },

What will the neural network of PPO be？

mannyv · June 20, 2021, 1:25pm

Hi @zzchuman,

Welcome to ray and rllib. Yes it uses both together. It would look like this:

FC(256) → FC(128) → LSTM(256) → FC(num_actions)

There is also a second path from the LSTM for the value function. There is only one LSTM here not 2.:
LSTM(256) → FC(1)

One word on your layer sizes.This would work but it is a common convention that the Lstm should have the same number or fewer units than the previous layer not more. That setting would be lstm_cell_size. The max_seq_len means that backprop through time for the Lstm will be truncated every 20 environment steps.

zzchuman · June 20, 2021, 1:50pm

Thanks for you reply! I got it !
And I have another question about curiosity and lstm.
I find that lstm can not be used with curiosity module, right?

mannyv · June 20, 2021, 6:16pm

Depends on what you mean. I am pretty sure that the policy that produces the logits used to choose an action can have an Lstm in it but I think you are right that the model used by thr Curiosity exploration module does not support RNNs.

It is also the case that Curiosity exploration can only be used if num_workers is 0.

mickelliu · December 15, 2021, 11:07am

hi @mannyv, many thanks for your contributions to this community.

I was wondering if you could more FC layers immediately downstream of LSTM, specifically, I saw a setting called post_FC_hidden.

Thanks in advance.

mannyv · December 16, 2021, 9:05pm

Hi @mickelliu,

Unfortunately that will not work. The lstm wrapper is hardcoded to create one final hidden layer between the lstm andnt outputs here

github.com

ray-project/ray/blob/0f0813b7b6e30ec76d48aafbe6017defd0585b08/rllib/models/torch/recurrent_net.py#L159

    
      
          # Define actual LSTM layer (with num_outputs being the nodes coming
          # from the wrapped (underlying) layer).
          self.lstm = nn.LSTM(
              self.num_outputs, self.cell_size, batch_first=not self.time_major)
          
          
# Set self.num_outputs to the number of output nodes desired by the
          # caller of this constructor.
          self.num_outputs = num_outputs
          
          
# Postprocess LSTM output with another hidden layer and compute values.
          self._logits_branch = SlimFC(
              in_size=self.cell_size,
              out_size=self.num_outputs,
              activation_fn=None,
              initializer=torch.nn.init.xavier_uniform_)
          self._value_branch = SlimFC(
              in_size=self.cell_size,
              out_size=1,
              activation_fn=None,
              initializer=torch.nn.init.xavier_uniform_)

You could do this if you wrote a custom model. It should be pretty straightforward to copy that wrapper and add that feature.

Topic		Replies	Views
Assert seq_lens is not None -> PPOTrainer RLlib	4	1459	October 14, 2021
LSTM Auto Wrapper RLlib	6	1570	October 2, 2021
Ppo add the lstm NN RLlib	6	2681	July 8, 2021
Passing Observation straight to LSTM RLlib	1	272	October 12, 2021
Best way to have custom value state + LSTM RLlib	9	3074	April 10, 2022

Fcnet_hiddens and lstm settings

Related topics