by Peter Nash
Last Updated May 14, 2015 08:08 AM

For t-tests, according to most texts there's an assumption that the population data is normally distributed. I don't see why that is. Doesn't a t-test only require that the sampling distribution of sample means is normally distributed, and not the population?

If it is the case that t-test only ultimately requires normality in the sampling distribution, the population can look like any distribution, right? So long as there is a reasonable sample size. Is that not what the central limit theorem states?

(I'm referring here to one-sample or independent samples t-tests)

For t-tests, according to most texts there's an assumption that the population data is normally distributed. I don't see why that is. Doesn't a t-test only require that the sampling distribution of sample means is normally distributed, and not the population?

The t-statistic consists of a ratio of two quantities, both random variables. It doesn't just consist of a numerator.

For the t-statistic to have the t-distribution, you need not just that the sample mean have a normal distribution. You also need:

that the $s$ in the denominator be such that $s^2/\sigma^2 \sim \chi^2_d$*

that the numerator and denominator be independent.

*(the value of $d$ depends on which test -- in the one-sample $t$ we have $d=n-1$)

For those three things to be actually true, you need that the original data are normally distributed.

If it is the case that t-test only ultimately requires normality in the sampling distribution, the population can look like any distribution, right?

Let's take iid as given for a moment. For the CLT to hold the population has to fit the conditions... -- the population has to have a distribution to which the CLT applies. So no, since there are population distributions for which the CLT doesn't apply.

So long as there is a reasonable sample size. Is that not what the central limit theorem states?

No, the CLT actually says not one word about "reasonable sample size".

It actually says nothing at all about what happens at any finite sample size.

I'm thinking of a specific distribution right now. It's one to which the CLT *certainly does* apply. But at $n=10^{15}$, the distribution of the sample mean is plainly non-normal. Yet I doubt that any sample in the history of humanity has ever had that many values in it. So - outside of tautology - what does 'reasonable $n$' mean?

So you have twin problems:

A. The effect that people usually attribute to the CLT -- the increasingly close approach to normality of the distributions of sample means at small/moderate sample sizes -- isn't actually stated in the CLT**.

B. "Something not so far from normal in the numerator" isn't enough to get the statistic having a t-distribution

**(Something like the Berry-Esseen theorem gets you more like what people are seeing when they look at the effect of increasing sample size on distribution of sample means.)

The CLT and Slutsky's theorem together give you (as long as all their assumptions hold) that as $n\to\infty$, the distribution of the t-statistic approaches standard normal. It doesn't say whether any given finite $n$ might be enough for some purpose.

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger