by krishna
Last Updated August 27, 2018 10:19 AM

In the course of learning machine learning, I understand that bootstrapping is considered a powerful approach for adjusting for overfitting in measures of predictive ability of a model. Based on information I have gathered, I have tried the following approach for my work (logistic regression to predict binary outcome with 50 predictor variables),

- Fit the model to full original data
- Feature selection with backward stepwise algorithm with 100 bootstraps. (and select the variables that are chosen most frequently)
- Repeat the following 1000 times; a. Generate bootstrapped data b. Split the bootstrapped data into training (70% data) and test (30% data) sets c. Fit the selected model (from step 2) to training data d. Test the fitted model with test data e. Calculate Area Under Receiver Operating Curve (AUROC)
- Report average AUROC with 95% Confidence Interval.

First, I would appreciate if someone could comment on whether there is any problem in the approach.

Second, do I understand right that I am basically evaluating the selected variables for their predictive ability, but not the model itself. Because during each iteration, the model is fit with different data and thus the model will be different each time with different estimated coefficients. So, can we really say that we are evaluating the model?

I will be very thankful for anyone who could help me out with this confusion!

Best wishes,

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger