@mannyv thanks for the reply. Yes, it’s only in SB2, the link in the original post points to that (also see the linked paper, that has a clearer illustration).
When you request a wrapper the model catalog will create and use a final layer in the wrapper.
To me it seems like that the section that you mentioned (with or without
net_arch) clearly applies the RNN on the latent features BEFORE any of the policy or vf layers. I think this might also be the case for RLlib, but it’s slightly obfuscated.
I think what’s happening is that when the recurrent wrapper is used, the
num_outputs that’s used to instantiate the wrapped class is
self._logits is also
False. This leads to the wrapped class’ forward function to use the
return conv_out, state case instead of
return logits, state (see here), i.e. it passes the latent vector to the wrapper and not the output of
So if that’s correct, then I think it actually clarifies my main question:
Or am I missing something and if the wrapper is used then the wrapped policy outputs the features?
Yes, when the wrapper is used then the wrapped policy passes the extracted features to the wrapper.
Edit: one thing that might be a potential bug, that the
post_fcnet_hiddens layers are not applied when the wrapper is used (
_logits_branch should be constructed similarly to the wrapped class’
_logits). It was added recently to
VisionNetwork (and I guess to others as well), and maybe that change was not implemented in the wrappers.