Best way to handle dependency of hyper-parameters

In a neural network, one may want to tune structural hyper-parameters such as the number of blocks, or layers within those blocks and of units within those layers, like shown for instance in the simple metamodel in the image below.


What is the best way to model parameters that make sense only when the value of another one is strictly greater than 0? For instance in the provided image, if the optimizer chooses to not create the layer block j, the parameters relative to that block will be mute. The same thing applies if, for instance, the block exists but there are no layers in it or the units are set to 0.

How is this situation handled? Should we avoid creating such elasticity in the first place?

1 Like