# Multiple Regression, good P-value, but Low R2

by towi_parallelism   Last Updated September 18, 2018 14:19 PM

I am trying to build a model in R to predict Conversion Rate (CR) based on age, gender, and interest (and also the campaign_Id):

The CR values look like this:

The correlation coefficients are not very promising:

`rcorr(as.matrix(data.numeric))`

correlations with CR:

xyz_campaign_id (-0.19), age (-0.1), gender(-0.04), interest(-0.03)

So, the model below:

``````library(caret)
set.seed(100)
TrainIndex <- sample(1:nrow(data), 0.8*nrow(data))
data.train <- data[TrainIndex,]
data.test <- data[-TrainIndex,]
nrow(data.test)
model <- lm(CR ~ age + gender + interest + xyz_campaign_id , data=data.train)
``````

will not have a good adjusted r-squared (0.04):

``````Call:
lm(formula = CR ~ age + gender + interest + xyz_campaign_id,
data = data.train)

Residuals:
Min      1Q  Median      3Q     Max
-18.636 -11.858  -4.087   0.115  96.421

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)     47.231250   6.287738   7.512  1.4e-13 ***
age35-39         1.214713   1.916649   0.634  0.52639
age40-44        -1.971037   1.986316  -0.992  0.32131
age45-49        -3.064858   1.866713  -1.642  0.10097
genderM          3.709192   1.412311   2.626  0.00878 **
interest         0.030384   0.027617   1.100  0.27154
xyz_campaign_id -0.037856   0.006076  -6.231  7.1e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.16 on 907 degrees of freedom
Multiple R-squared:  0.05237,   Adjusted R-squared:  0.04611
F-statistic: 8.355 on 6 and 907 DF,  p-value: 7.81e-09
``````

I also understand that I should probably convert "interest" from numeric to factor (I have tried that too, although I considered all 40 interest levels which is not ideal)

So, based on the provided information, is there any way to improve the model? what other models shall I try besides linear models to make sure that I have a good predictive model?