by joshphysics
Last Updated April 14, 2018 22:19 PM

Suppose that we construct a probability model for some process which we believe generates random numbers by independently sampling from a probability density function $p_{\theta}$ characterized by a parameter (or vector of parameters) $\theta$.

Suppose further that we want to infer the value of some statistic $S(\theta)$ along with our confidence in our inferred value, and we decide to do this using the standard, frequentist algorithm which, as I currently understand it, proceeds as follows:

- We run the underlying process $N$ times and thus draw $N$ independent samples from the distribution $p_\theta$.
- We infer the value of $\theta$ by computing some reasonable estimator $\hat\theta$ using our samples. This induces an inference $p_{\hat\theta}$ for the underlying distribution we're sampling from and, if we wish, an estimate $S(\hat\theta)$ of the statistic we care about.
- We use $p_{\hat\theta}$ to infer the sampling distribution of the statistic $S$. Let $p_{S, \theta, N}$ be the true sampling distribution for samples of size $N$, then the inferred sampling distribution will be $p_{S, \hat\theta, N}$.
- We find the smallest interval $I_{\alpha}$ containing $100(1-\alpha)\%$ of the mass of the distribution $p_{S, \hat\theta, N}$. As I understand it, this is the $100(1-\alpha)\%$ confidence interval for our estimate of $S$ given our sample.

Assuming I have not incorrectly described the basic procedure for computing frequentist confidence intervals, my understanding indicates that the following claim is often made and used as a basis for the interpretation of these confidence intervals:

If one were to repeat the procedure above a large number of times, then $100(1-\alpha)\%$ of the computed intervals $I_\alpha$ would contain the true value $S(\theta)$ of the statistic.

**Question.** How does one prove this claim?

In particular, I understand that the confidence interval $I_\alpha$ computed from a single inferred sampling distribution $p_{S, \hat \theta, N}$ would approximately (since we don't have the true value of $\theta$) contain $100(1-\alpha)\%$ of the sample values $S(\hat \theta)$ if one were to generate a large number of samples of size $N$, but it's not at all clear to me how this is implies the the statement quoted above about the frequency of confidence intervals that would contain the true sample value.

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger