# Are the balls drawn randomly from several urns?

by Sulawesi   Last Updated June 15, 2015 23:08 PM

There are $r$ urns that contain different number of balls. The number of balls (before sampling) in the $i^{th}$ urn is $N_i$. John sampled x balls in total (without replacement) from all those urns. The number of balls coming from each urn is $x_1, x_2, ..., x_{r-1}, x_r$.

The question I am trying to answer: Did each ball has the same probability of being sampled or is the probability of a given ball to be sampled dependent on the size of the urn?

Consider for example the following dataset. There are 3 urns which contain 88, 8 and 3 balls respectively ($N_1 = 88, N_2 = 8, N_3 = 3$). There are therefore 99 balls in total ($N_{Tot} = 99$). John sampled 17 balls out of those three urns ($x = 17$). 10 balls from urn 1, 5 balls from urn 2 and 2 balls from urn 3 ($x_1 = 10, x_2 = 5, x_3 = 2$). Did all the balls have the same probability of being sampled?

Here is my attempt to test for this question:

x_i should follow a hypergeometric distribution with parameters $N_{tot}-N_i$ (failure), $N_i$ (success) and $x$ (total number of balls drawn), where $N_{tot} = \sum_{i=1}^r N_i$ is the total number of balls in the $r$ urns. I can compare $x_i$ to the median of the hypergeometric distribution to tell for each urn whether the urn was rather overrepresented in the sample or underrepresented. Let's call this binary variable B

Then, I can perform a Spearman correlation between B and the size of the urn. If the p.value of the Spearman correlation is low enough then, I can conclude that the balls were not drawn at random from all those urns.

Does it seem to be a good methodology? Would it be a test with low power or would it be completely wrong?

Tags :