I am wondering what the intended architecture should look like when using ppo with vf_share_layers = False and an lstm. Based on the comment in the ppo config and reading the documentation I would have thought that if it is set to false then there would be no sharing of the layers above the lstm.
When I was looking at the LSTMWrapper code I realized that they would be sharing those layers even when vf_share_layers=False.
Here is a picture of the network with use_lstm=False. To make it more clear I have add 3 to the size of the _value_branch_separate model. As you can see they are not shared. The policy is on the left (size 256) and the vf is on the right (size 259).
Here is a picture of the network with use_lstm=True. As you can see in this one the layers going into the lstm are shared. I could not get the order the same so in this case the value function is on the left and the policy is on the right.
My question then is whether this is the intended behavior and if so, can it be documented somewhere?