Feature selection in scikit-learn and multilinear regression

by DJname   Last Updated September 12, 2019 02:19 AM

I have a process with ~10 features and ~100 responses, and would like to search for models for how those features interact to create various responses. ~100 experiments were run, exploring combinations of values of the features, so it’s a limited training set. I was thinking about doing multilinear regression (possibly exploring quadratic terms) but I'm not sure what's the most elegant/simple way is, in scikit-learn, to explore all possible model parameters to find the most convincing models.

If you have experience with this sort of problem (seems like a standard subset selection problem?) in scikit-learn or statsmodels, please give me some pointers. And please let me know if my question is unclear.

Answers 1

First, I am not sure I totally understand your notation. Are you trying to say, if you divided number of data points by number of feature you get ~10, i.e., 10 times data than number of features? The big O notation without n in there is really strange..

And what do you mean by $O(10^2)$ "experiments" were run ..., How do you define "experiments"? is it number of predictions?

Without totally understand it, I will still try to answer. I would not recommend Best subset or stage wise feature selection. Try to use all features with regularization.

This is really a standard problem. Try to search ridge regression. In R glmnet can be used.

October 17, 2016 19:57 PM

Related Questions

Transform date 2019-05-22 18:04:24.382271+00

Updated June 06, 2019 07:19 AM

Dropping One-hot-encoded columns in Pandas/Sklearn

Updated March 20, 2019 22:19 PM