Hello, I want to add the lstm NN to my PPO agent.
But, I do not know how does the lstm NN add to the PPO.
I guess that the lstm NN is added to both the actor and critic network, right?
And the actor and critic network is the same structure, right?
Hello @mannyv ,I didn’t explain my question clearly。
If I use the ppo with lstm. I means that I set the model config is:
“model”: {
“fcnet_hiddens”: [256,64],
“use_lstm”: True,
“lstm_cell_size”: 64,
},
The action NN and value NN use the same network structure,right? I use the default config about vf_share_layers.
If I use the ppo without lstm. I means that I set the model config is:
“model”: {
“fcnet_hiddens”: [256,64,16],
},
The action NN and value NN use the same network structure,right? I use the default config about vf_share_layers.
I am not set the config about vf_share_layers. So, it is “vf_share_layers”: DEPRECATED_VALUE. Does the DEPRECATED_VALUE is false?
The deprication warning you are seeing is because that key now lives at config[model][vf_share_layers]. For PPO the default value is False. If you are using fully connected layers only then it will by default be false and they will not share networks. If you change it to True they will share layers. This is the first image in the link I put in the last post.
If you include an lstm with use_lstm=True then the default value for config[model][vf_share_layers] will still be False but the network will actually share layers. It is not possible right now to use an lstm and not share layers. So when you add an lstm it does not matter what you set config[model][vf_share_layers] as it will always share layers.
This is the second image in that link.
Thank you! I have read the tllib code. In the rnn_model.py and shared_weight_model.py, you just define one NN and the input size is observation_space.shape. You mainly distinguish between the value network and the policy network by setting different sizes in the output layer, right?
And the input size of value network and the policy network is same, right?
If you have a fully connected network then by default PPO will set vf_share_layers as False. Under this setting you will have one architecture (same number of inputs, activation functions, layer sizes, they will have different output sizes) but two independent copies of it with different random initializations.
If you change that value to True then you have one network architectures and only one network copy. This time your model is multi headed. The final layer for the policy and value functions share a parent layer. They both take their input from the exact same hidden layer.
If you add an lstm then regardless of the vf_share_layers setting they will always share one network. This time the policy and value layers will take as input the lstm output.