Prediction model validation using bootstrapping

by krishna   Last Updated August 27, 2018 10:19 AM

In the course of learning machine learning, I understand that bootstrapping is considered a powerful approach for adjusting for overfitting in measures of predictive ability of a model. Based on information I have gathered, I have tried the following approach for my work (logistic regression to predict binary outcome with 50 predictor variables),

  1. Fit the model to full original data
  2. Feature selection with backward stepwise algorithm with 100 bootstraps. (and select the variables that are chosen most frequently)
  3. Repeat the following 1000 times; a. Generate bootstrapped data b. Split the bootstrapped data into training (70% data) and test (30% data) sets c. Fit the selected model (from step 2) to training data d. Test the fitted model with test data e. Calculate Area Under Receiver Operating Curve (AUROC)
  4. Report average AUROC with 95% Confidence Interval.

First, I would appreciate if someone could comment on whether there is any problem in the approach.

Second, do I understand right that I am basically evaluating the selected variables for their predictive ability, but not the model itself. Because during each iteration, the model is fit with different data and thus the model will be different each time with different estimated coefficients. So, can we really say that we are evaluating the model?

I will be very thankful for anyone who could help me out with this confusion!

Best wishes,



Related Questions


Using machine learning models in an explanatory way

Updated November 16, 2017 10:19 AM


Minimizing bias in explanatory modeling, why?

Updated April 24, 2018 01:19 AM

Generative vs Discriminative models

Updated November 29, 2016 08:08 AM