@PREJAN , as you can see in the source code of the ComplexInputNetwork this network possesses possibly mutliple models itself. As you do not post the action/obs spaces we cannot make precise answers here, but you might find your model under algo.get_policy().model.logits_and_value_model
it strikes me that the input layer has 256 dimensions (not like observation space’s 32*12=384) and value_out has one dimension (not like action space’s 3), but I’m still at an early learning stage and there are many concepts I don’t grasp yet. Could it be that this is printing just part of the model?
@PREJAN , I am happy that you found a way to print your model to understand better the architecture of it.
As your obs space has two dimensions the ComplexInputNetwork is chosen automatically.
To your understanding of the inputs and outputs. Take a look at the Input layer of the ComplexInputNetwork; what you see is that this input comes not directly from the observation space, but from the post_fc_stack, so it is a pre-processed embedding. The embedding size can be defined in your config by choosing config["model"]["post_fcnet_hiddens"] (the default is 256).
To the output, the model outputs by two branches, namely value_out and logits. The former is the estimate for the value function and should be of dimension 1. The latter is the action output and should be in case of a continuous action space 2 x action_space.shape, so here 6. Why is this? What do you think?
I won’t delay more my answer to not be rude, but I’ve researched and tried to find the answer to the question. But I’m still not quite there… my best guess now would be ¿mean and distribution for each action space dimension? Still unclear to me, but the research is helping me understand a bit better PPO so thank you for that. For now I understand it has the actor critic networks, where the critic outputs one single number representing the value of the state we ended at (as you explained on your comment), as a way to criticise the action proposed by the actor network. And I understand the output of the actor network are probabilities of actions (thus… mean and distribution) and that’s where I’m still a bit lost, but I’ll get there eventually I hope
I’ll also try to turn my observations space into a 1D array to get a simpler model, as it does not represent an image thus I guess cnns are not relevant and that would simplify my learning path with a not so “complex” Input network
@mannyv hanks also for that tip, also very clarifying, I guess it’s called a complex input network because it’s composed first of some cnn layers to deal with the 2D images, then the fully connected layers to do brain magic, and finally the logits and values to act as critic actor expected output / I hope this is close enougth to not sound stupid : )