Can random forest based feature selection method used for multiple regression in machine learning

by tiantianchen   Last Updated December 07, 2017 09:19 AM

I would like to have a good feature selection method for a continuous response variable, over around 100 predictors. I would like to keep my model type as linear multiple regression model, rather than tree-based model.

My current method: I could calculate the (linear) correlation between each of my predictor and response, and select a subset of predictors with "strong" correlations for the final multiple regression. The prediction performance of the selected predictors will be then determined in this final multiple regression model. However, feature selection in this way is subjective, and I am afraid of missing "important" features.

I would like to apply a more objective and complete way of feature selection, such as "all-relevant" feature selection in Boruta or "variable of importance" in random forest. However, as I understand, both methods are based on tree-based random forest, which are not linear regression.

My questions are:

  1. Is my current method a proper way to handle my research purpose?

  2. Can random forest based feature selection handle feature selection purpose for multiple linear regression model?

  3. Are there any other feature selection methods recommended?

Thanks



Related Questions


Forecast Model recognize future trend

Updated November 16, 2017 10:19 AM


Random Forest: high variable importance but ignorable?

Updated February 26, 2017 22:19 PM