How to estimate variance of underlying noise?

by Prateek Arora   Last Updated September 13, 2017 14:19 PM

I am a beginner to Machine Learning in Python. I have been given set of corresponding values for y and x such that y is a polynomial of an unknown degree in x. Additionally, in the provided data, Gaussian noise of zero mean and an unknown variance has been added, so the y values that are given to me are actually y=f(x) + E (where E is the added Gaussian noise). Is there a way to find the variance for this underlying noise(for instance, any in-built function in python)? Any help would be appreciated.



Answers 1


If your model is

$$ y_i = f(X_i) + \varepsilon_i $$

with Gaussian noise $\varepsilon_i$ having mean zero and unknown variance $\sigma^2$, then it translates to

$$ y_i | X_i \sim \mathcal{N}(f(X_i), \sigma^2) $$

then to calculate the variance of errors, you would first need to estimate the mean of the conditional distribution $E(y_i|X_i) = f(X_i)$, since variance is the squared distance from the mean. So first you need to would need to estimate the mean of the conditional distribution, i.e. $f(X_i)$. Then you simply estimate the variance by substituting

$$ \newcommand{Var}{\mathrm{Var}} \Var(Y) = E[(Y - E(Y))^2] $$

with

$$ \widehat{\Var}(Y) = \mathrm{mean}((Y - \hat f(X))^2) $$

where $\hat f(X)$ is your estimate of $f(X)$.

TLDR; You need to estimate $f(X)$ and then calculate the variance of residuals and this will be your estimate of the variance.

Tim
Tim
September 13, 2017 13:46 PM

Related Questions




Machine Learning: Feature Comparison

Updated May 29, 2017 19:19 PM