Fcnet_hiddens and lstm settings

I want to customize the neural network structure of the reinforcement learning agent.
And I found that the model config can be defined directly.
But I have a question, whether “use_lstm” and “fcnet_hiddens” can be used at the same time.
If I set the PPO agent model as this:
“model”:{
“fcnet_hiddens”: [256, 128],
“fcnet_activation”: “tanh”,

        "use_lstm": True,
        "max_seq_len": 20,
        "lstm_cell_size": 256,
    },

What will the neural network of PPO be?

Hi @zzchuman,

Welcome to ray and rllib. Yes it uses both together. It would look like this:

FC(256) → FC(128) → LSTM(256) → FC(num_actions)

There is also a second path from the LSTM for the value function. There is only one LSTM here not 2.:
LSTM(256) → FC(1)

One word on your layer sizes.This would work but it is a common convention that the Lstm should have the same number or fewer units than the previous layer not more. That setting would be lstm_cell_size. The max_seq_len means that backprop through time for the Lstm will be truncated every 20 environment steps.

2 Likes

Thanks for you reply! I got it !
And I have another question about curiosity and lstm.
I find that lstm can not be used with curiosity module, right?

Depends on what you mean. I am pretty sure that the policy that produces the logits used to choose an action can have an Lstm in it but I think you are right that the model used by thr Curiosity exploration module does not support RNNs.

It is also the case that Curiosity exploration can only be used if num_workers is 0.

hi @mannyv, many thanks for your contributions to this community.

I was wondering if you could more FC layers immediately downstream of LSTM, specifically, I saw a setting called post_FC_hidden.

Thanks in advance.

Hi @mickelliu,

Unfortunately that will not work. The lstm wrapper is hardcoded to create one final hidden layer between the lstm andnt outputs here

You could do this if you wrote a custom model. It should be pretty straightforward to copy that wrapper and add that feature.

1 Like