I have a multi-variate regression problem. Let's say there is a physical system with a true model:
$$ y = b_0x_0 + b_1x_1 + b_2x_2 \;\;\;\;\;\;\;\;\;\; (1) $$
Now, imagine I only have access to a subset of the true independent variables, such that I fit a model (let's assume the modeling process is fully accurate to the given data) as:
$$ y' = b_0'x_0 + b_1'x_1 \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (2) $$
The model produces a set of non-zero errors because my observations do not contain values for $x_2$.
My primary goal is to understand the regression coefficients. My understanding is that $b_0'$ and $b_1'$ will be inflated compared to $b_0$ and $b_1$ (assuming all coefficients are positive) because the contribution of $x_2$ to the prediction will be partially incorporated into the contributions from $x_0$ and $x_1$ (though imperfectly, resulting in the model errors). This should happen even if all variables are uncorrelated.
I want to report that the variable $x_0$ contributes $b_0'x_0$ to $y$, but as we can see between (1) and (2), it seems like $b_0'$ is entirely dependent on the other selected independent variables. This means that I can increase it's contribution almost-arbitrarily simply by removing more independent variables from the modeling.
My other concern is this: if I only have model (2), how do I know if important $x_2$ or $x_3$ or $x_4$ terms (and so on) exist in model (1)? Do I just assume that I have a 'correct' model with sufficient independent variables when its error approximates 0? Furthermore, if I can never approximate 0 error, is there a technique that can calculate error of the coefficients (i.e. how the coefficients might decrease if we did have knowledge of these 'missing' variables that would produce 0 error)?.
This is probably a large topic, so please let me know if there is a particular name for these concepts that I could investigate further.