by DJname
Last Updated September 12, 2019 02:19 AM

I have a process with ~10 features and ~100 responses, and would like to search for models for how those features interact to create various responses. ~100 experiments were run, exploring combinations of values of the features, so itâ€™s a limited training set. I was thinking about doing multilinear regression (possibly exploring quadratic terms) but I'm not sure what's the most elegant/simple way is, in scikit-learn, to explore all possible model parameters to find the most convincing models.

If you have experience with this sort of problem (seems like a standard subset selection problem?) in scikit-learn or statsmodels, please give me some pointers. And please let me know if my question is unclear.

First, I am not sure I totally understand your notation. Are you trying to say, if you divided number of data points by number of feature you get ~10, i.e., 10 times data than number of features? The big O notation without n in there is really strange..

And what do you mean by $O(10^2)$ "experiments" were run ..., How do you define "experiments"? is it number of predictions?

Without totally understand it, I will still try to answer. I would not recommend Best subset or stage wise feature selection. Try to use all features with regularization.

This is really a standard problem. Try to search ridge regression. In R glmnet can be used.

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger