In neural nets, there is often a weight regularization term in the loss function, which ensures that unnecessarily high weights don't occur. For example, if $C(\theta,x)$ is the baseline loss function, then we have the loss $L(\theta,x)=C(\theta,x)+\sum L(\theta_i)$, (where $L$ is the $L_1$ norm or $L_2$ norm usually).
However, there is nothing in principle that prevents us from choosing a more complicated weight regularization term. we can have a generalized weight regularization term $\mathcal L(\theta)$.
why not tell our neural network that it should "look like a boolean circuit with 2 inputs in each neuron"? We choose sigmoid activation functions, and use weight regularization that penalizes having more than $2$ non-zero weights for each neuron.
why not have a separate neural network look at the original network's weights and its performance, and set weight regularization cost for each weight separately?
Have things in this direction been done? are there benefits? (apart from the obvious computational downsides).