Well performing model can't predict similar dataset with identical features

by Xiao Wei Chen   Last Updated July 12, 2018 01:19 AM

I have conversion rate data from two largely similar marketing campaigns: source & target.

The source (1m events, 3% conversion) and target (20k events, 1% conversion) data come from the same source, and share the same ~10 numerical & categorical features (e.g. day of week, local time of day, device type, device OS, etc) & over 90% similar values by feature.

A random forest model trained on the source data can well predict test source events, with no signs of overfitting. source predicting source x axis: actual | y axis: prediction

To account for the different conversion rates, one feature is the n-sample trailing conversion rate.

I have tried different mixes of source & target data, but none can predict test target events.

target+source mix predicting target x axis: actual | y axis: prediction

Given events are recorded by the same measurement techniques, and there are 20k target events, I'm hesitant to conclude the target data is too noisy.

What techniques can I try?



Related Questions


Change image input size of a pre-trained convnet

Updated November 04, 2017 11:19 AM

How much data is needed for transfer learning?

Updated October 31, 2017 10:19 AM