Hi @A_M,
and welcome to the forum! This is a known issue (see e.g. here).
The VisionNet
needs the right filters. What means right here? It means that at the end of all the filtering there must be a dimension [B, 1, 1, F]
, where B
stands for the batch size (does not get considered in the conv_filters
attribute) and F
for the number of feature maps (the number of “neurons” the following Dense
layer should connect to.
In your last example you do exactly this: applying the [32, [15, 15], 1]
to a [B, 15, 15, 3]
image results in a [B, 1, 1, 32]
output ("valid"
is used). The first example does not result in a [B, 1, 1, F]
output: Using three filters with kernels [5, 5]
and stride 1
, results in a size of [B, 15, 15, 128]
(note, "same"
is used for the convolution in all fConv2D
layers, but the last). What you want to do is to set the filters with strides larger 1
:
"conv_filters": [
[32, [5, 5], 2],
[64, [5, 5], 2],
[128, [5, 5], 2],
[256, [4, 4], 1]
],
Hope this helps.
Simon