I am running some linear regressions in R. I am dealing with a linear dependent and linear as well as categorical independent variables using lm. So far, I have looked at the output that
summary(model) gives me.
Calculates type-II or type-III analysis-of-variance tables for model objects.
I am under the impression that this
Anova() returns an F instead of the t-statistic but is ~ equivalent in what its tell me. (sample output below). So I was wondering
Are standard R
summary(lm) and car
Anova(lm) indeed doing pretty much the same calculations here? If not, what is the difference?
They both report the same p-value, however the F-statistic at the bottom of the standard output is different from the
Anova() one. Why is that?
What are applications where one would choose one over the other?
Any help is much appreciated!
summary(linreg) ... Estimate t value Pr(>|t|) Age -18.016 -3.917 0.000107 Gender -45.4912 -4.916 1.35e-06 --- Residual standard error: 85.81 on 359 degrees of freedom F-statistic: 16.71 on 2 and 359 DF, p-value: 1.147e-07
Anova(linreg) Anova Table (Type II tests) Sum Sq F value Pr (>F) Age 112997 15.345 0.0001072 Gender 1777936 24.164 1.348e-06
The tests are not generally the same, but in this case they are and will be for any two parameter model. This is because the t-test on the summary table compares the the full model $$ y = \beta_0 + \beta_1 x ...(1)$$ with the model that $y = \beta_1 x$ for testing $\beta_0 = 0 $ and then for testing $\beta_1 = 0 $ it compares $y = \beta_0 $ with (1). While the F test in the anova table compares $y = 0$ with $y = \beta_0$ for testing that $\beta_0 = 0$ and then compares $y= \beta_0$ with $y = \beta_0 +\beta_1 x$ for testing that $\beta_1 = 0$. Thus the two tests here gives almost the same results, but for the case where there are multiple explanatory variables the results will differ, and the difference will be more apparent if the variables are correlated, another difference is that the anova table F-test may differ with different ordering of the explanatory variables thus anova table is more preferable if the is suspect of correlation between the explanatory variables (suspect that they may explain the same variation in the response), while the t-test is best as an first step of assessment , hope that answers all your questions.