How does Vowpal Wabbit handle NA's or missing values?

by Frank P.   Last Updated September 01, 2018 22:19 PM

I'm working on a problem that involves a large amount of NA's. How does VW work around this? Should I try to impute the NAs with colmeans or something similar before piping into VW format?

Answers 1

To elaborate on my answer:

Lets say the first line of your data is:

y, v1, v2, v3
10, 5, NA, 3

The VW string encoding of that line is:

10 |v1:4 v2:NA v3:3

As you probably discovered v2:NA doesn't work for VW, as the part after the colon needs to be a number.

An easy solution to this is to find :NA in your VW string, and replace it with _NA:

10 |v1:4 v2_NA v3:3

This will work fine in VW, as it will internally recode v2_NA as v2_NA:1.

This will allow the model to learn what happens when v2 is NA, and how that differs from the case where it is known.

You could impute medians, but it's probably a better idea to:

  1. Compute a "NA flag" for each variable that is 1 when it is NA and 0 when it is not.
  2. Omit NA variables from your VW training file.
  3. Train on your dataset, omitting NAs and including flags.

This will let VW build a model that predicts one thing for an NA variable and another when it is present.

February 18, 2015 20:04 PM

Related Questions

Vowpal wabbit LDA

Updated March 19, 2017 18:19 PM

Vowpal Wabbit Logistic Regression Prediction

Updated July 10, 2015 13:08 PM

Vowpal wabbit comparison of optimization methods

Updated March 31, 2017 10:19 AM