How does one prove that a certain percentage of confidence intervals contain the true value of a statistic?

by joshphysics   Last Updated April 14, 2018 22:19 PM

Suppose that we construct a probability model for some process which we believe generates random numbers by independently sampling from a probability density function $p_{\theta}$ characterized by a parameter (or vector of parameters) $\theta$.

Suppose further that we want to infer the value of some statistic $S(\theta)$ along with our confidence in our inferred value, and we decide to do this using the standard, frequentist algorithm which, as I currently understand it, proceeds as follows:

  1. We run the underlying process $N$ times and thus draw $N$ independent samples from the distribution $p_\theta$.
  2. We infer the value of $\theta$ by computing some reasonable estimator $\hat\theta$ using our samples. This induces an inference $p_{\hat\theta}$ for the underlying distribution we're sampling from and, if we wish, an estimate $S(\hat\theta)$ of the statistic we care about.
  3. We use $p_{\hat\theta}$ to infer the sampling distribution of the statistic $S$. Let $p_{S, \theta, N}$ be the true sampling distribution for samples of size $N$, then the inferred sampling distribution will be $p_{S, \hat\theta, N}$.
  4. We find the smallest interval $I_{\alpha}$ containing $100(1-\alpha)\%$ of the mass of the distribution $p_{S, \hat\theta, N}$. As I understand it, this is the $100(1-\alpha)\%$ confidence interval for our estimate of $S$ given our sample.

Assuming I have not incorrectly described the basic procedure for computing frequentist confidence intervals, my understanding indicates that the following claim is often made and used as a basis for the interpretation of these confidence intervals:

If one were to repeat the procedure above a large number of times, then $100(1-\alpha)\%$ of the computed intervals $I_\alpha$ would contain the true value $S(\theta)$ of the statistic.

Question. How does one prove this claim?

In particular, I understand that the confidence interval $I_\alpha$ computed from a single inferred sampling distribution $p_{S, \hat \theta, N}$ would approximately (since we don't have the true value of $\theta$) contain $100(1-\alpha)\%$ of the sample values $S(\hat \theta)$ if one were to generate a large number of samples of size $N$, but it's not at all clear to me how this is implies the the statement quoted above about the frequency of confidence intervals that would contain the true sample value.

Related Questions