Multiple Regression, good P-value, but Low R2

by towi_parallelism   Last Updated September 18, 2018 14:19 PM

I am trying to build a model in R to predict Conversion Rate (CR) based on age, gender, and interest (and also the campaign_Id):

The CR values look like this:

CR

The correlation coefficients are not very promising:

rcorr(as.matrix(data.numeric))

correlations with CR:

xyz_campaign_id (-0.19), age (-0.1), gender(-0.04), interest(-0.03)

So, the model below:

library(caret)
set.seed(100)
TrainIndex <- sample(1:nrow(data), 0.8*nrow(data))
data.train <- data[TrainIndex,]
data.test <- data[-TrainIndex,]
nrow(data.test)
model <- lm(CR ~ age + gender + interest + xyz_campaign_id , data=data.train)

will not have a good adjusted r-squared (0.04):

Call:
lm(formula = CR ~ age + gender + interest + xyz_campaign_id, 
    data = data.train)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.636 -11.858  -4.087   0.115  96.421 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     47.231250   6.287738   7.512  1.4e-13 ***
age35-39         1.214713   1.916649   0.634  0.52639    
age40-44        -1.971037   1.986316  -0.992  0.32131    
age45-49        -3.064858   1.866713  -1.642  0.10097    
genderM          3.709192   1.412311   2.626  0.00878 ** 
interest         0.030384   0.027617   1.100  0.27154    
xyz_campaign_id -0.037856   0.006076  -6.231  7.1e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.16 on 907 degrees of freedom
Multiple R-squared:  0.05237,   Adjusted R-squared:  0.04611 
F-statistic: 8.355 on 6 and 907 DF,  p-value: 7.81e-09

I also understand that I should probably convert "interest" from numeric to factor (I have tried that too, although I considered all 40 interest levels which is not ideal)

So, based on the provided information, is there any way to improve the model? what other models shall I try besides linear models to make sure that I have a good predictive model?

If you need more information, the challenge is available Here



Related Questions


Comparing overall and conditional models

Updated July 26, 2018 14:19 PM

Finding impact of variables on a discrete varibale

Updated February 21, 2017 20:19 PM