Interpretation of regression coefficients with different subsets of independent variables

by ricky116   Last Updated August 10, 2018 13:19 PM

I have a multi-variate regression problem. Let's say there is a physical system with a true model:

$$ y = b_0x_0 + b_1x_1 + b_2x_2 \;\;\;\;\;\;\;\;\;\; (1) $$

Now, imagine I only have access to a subset of the true independent variables, such that I fit a model (let's assume the modeling process is fully accurate to the given data) as:

$$ y' = b_0'x_0 + b_1'x_1 \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (2) $$

The model produces a set of non-zero errors because my observations do not contain values for $x_2$.

My primary goal is to understand the regression coefficients. My understanding is that $b_0'$ and $b_1'$ will be inflated compared to $b_0$ and $b_1$ (assuming all coefficients are positive) because the contribution of $x_2$ to the prediction will be partially incorporated into the contributions from $x_0$ and $x_1$ (though imperfectly, resulting in the model errors). This should happen even if all variables are uncorrelated.

I want to report that the variable $x_0$ contributes $b_0'x_0$ to $y$, but as we can see between (1) and (2), it seems like $b_0'$ is entirely dependent on the other selected independent variables. This means that I can increase it's contribution almost-arbitrarily simply by removing more independent variables from the modeling.

My other concern is this: if I only have model (2), how do I know if important $x_2$ or $x_3$ or $x_4$ terms (and so on) exist in model (1)? Do I just assume that I have a 'correct' model with sufficient independent variables when its error approximates 0? Furthermore, if I can never approximate 0 error, is there a technique that can calculate error of the coefficients (i.e. how the coefficients might decrease if we did have knowledge of these 'missing' variables that would produce 0 error)?.

This is probably a large topic, so please let me know if there is a particular name for these concepts that I could investigate further.



Related Questions


Multiple regression, full and restricted model

Updated March 12, 2017 19:19 PM


How to interpret Quadratic Terms

Updated May 09, 2016 09:08 AM

Monte Carlo p-value for linear regression

Updated March 09, 2017 16:19 PM