Hypothesis Testing Glossary
Why So Weary?
When I try to read about statistics I get mired in the jargon. Even just moving past the phrase, “For a given parameterized distribution,” requires that I think about what it means for something to be “parameterized” and what a “distribution” is. I wind up reading in that plodding, word-by-word way that I might read a foreign language I happened to be studying. It’s exhausting.
Here I’ve gathered notes from the lessons I’ve done so far in my data science boot camp to define the salient terms.
The Basics
A hypothesis posits a relationship between two variables. A scientific experiment tests the hypothesis by comparing the value measured in a control group and the value measured in a treatment group. We assume that there is no relationship between the two variables. So, most of the time, the value taken from the treatment group will be close to the value taken from the control group. Once in a while, due to random chance, the value will be very different. If this chance is low enough, and our experiment shows it has happened, then we may conclude that our assumption about the lack of a relationship between the two variables was wrong. We cannot make the more satisfying conclusion that there is a relationship between the two, exactly; we can only conclude that there isn’t an absence of a relationship. The edifice of science rises by the gradual accumulation of cautious conclusions.
Glossary
Use CTRL + F (Windows) or CMD + F (Mac) to hop to the term you want.
A
alpha value (α): A critical value, often 0.05. Choose one before running your study. Compare it to the test statistic obtained from your study to decide whether to accept or reject the null hypothesis. Predicts the (hopefully small) probability of a type-1 error, a “false positive” result which leads you to reject the null hypothesis even though it is correct.
alternative hypothesis (H1): “There is a relationship between” the variables being compared. Accepted after rejection of the null hypothesis. In mathematical notation, will always have <, >, or !=.
B
beta value (β): A critical value. Predicts the probability of a type-2 error, a “false negative” result which leads you to accept the null hypothesis even though it is wrong. Compare to alpha.
C
Central Limit Theorem: A convenient miracle: the means of many samples taken from a non-normal distribution will themselves form a normal distribution. Allows for the prediction of the parameters of a population distribution through the much-easier prediction of the parameters of this, the sampling distribution.
confidence interval: A range of values in a sampling distribution. Somewhere within it lies the particular point estimate which matches the actual value of a parameter in the population.
confidence level: The percent value of a sampling distribution over which your desired confidence interval spreads. e.g. For a confidence level of 95%, the confidence interval will contain all values up to 1.96 standard deviations away from the mean of a normal distribution. (1.96 is also the z score here.) i.e. There is a 95% chance that the true mean of the population lies in this range. But all this depends on your knowing the standard deviation of the population, which is rare.
critical value: Separates the rejection region from the failure-to-reject region. Alpha or beta. Used in comparison to the test statistic.
D
degree of freedom (DDOF): Sample size minus 1. (n — 1.) The higher this is, the more normal a t distribution looks.
distribution: A set of values plotted as a curve. In probability, the values are the results of many trials of an event. The bulk of the values are often clustered symmetrically around the mean value while the rarer, outlying values are plotted along the “tails”. Can have discrete or continuous values; can have measurable “parameters”; can have a “normal” shape or a squatter “t” shape; can plot the values of a population or parameter values of samples from that population; &c.
F
failure-to-reject region: The set of values outside one or both tails of a distribution in which the test statistic must fall in order to accept the null hypothesis.
G
Gaussian distribution: A normal distribution.
H
hypothesis: A prediction about a quantitative relationship between variables. Can be a proportion, e.g. “Some of…,” “Most of…,” or a mean. In statistics, it is written as two statements: a null hypothesis, with “=”, predicting no relationship between variables, and an alternative hypothesis, with “<,” “>,” or “!=,” predicting a relationship.
M
maximum likelihood estimation (MLE): A method to find a parameter (e.g. mean or standard deviation) of a distribution by plotting a new distribution for the possible values of that parameter. At the maximum of this new distribution, where the tangent line will have a slope of 0 (like a wooden board balanced perfectly level on the peak of a mountain), lies the estimate for the most likely value of the parameter of the original distribution.
mu, lower-case (μ): The mean (average) parameter of a distribution.
N
n: The size of a sample taken from a population.
normal distribution: The “bell curve” or Gaussian distribution. Its well-known parameters allow for easy prediction of the probability of the values that fall on its curve. The distributions of many phenomena in the world take this shape, even, amazingly, the distribution of the means of many samples taken from a non-normal distribution. See Central Limit Theorem.
null hypothesis (H0) : “There is no relationship between” the variables being compared. In mathematical notation, will have an = sign. To “reject” it must lead you to accept the alternative hypothesis, which is as close as you can get to proving a relationship between two variables. “Failure to reject” is to accept the absence of a relationship. (If you’re looking for statistical significance, rejection is “good”; failure to reject is “bad.”)
O
one-tailed test: An experiment that tests whether a value for a treatment group will be less than or greater than the value for a control group. The “tail” is the rejection region, the small range of values at one end of the distribution in which the test statistic must fall in order to reject the null hypothesis.
P
parameter: A measure, e.g. mean or variance, of a population distribution.
point estimate: A parameter (e.g. mean or variance) of a sample. May be used as a proxy for the same parameter of the sample’s overall population. As more and more of these accumulate, they begin to form a probability distribution. e.g. Many point estimates of the mean of a population form a normal distribution which allows you to predict the actual mean of that population with a specific confidence interval. See Central Limit Theorem.
P-value: The probability associated with the test statistic. (A different value than the proportion “p” used to write the null and alternative hypotheses.) A value between 0 and 1 that you are incorrectly guessing that the null hypothesis is false. i.e. If your chosen alpha value is 0.05, then a P-value of 0.05 or smaller is enough to assume that you are correct in guessing that the null hypothesis is false, i.e. that there is a relationship between the two variables under observation.
R
rejection region: The set of values in one or both tails of a distribution in which the test statistic must fall in order to reject the null hypothesis.
R squared (R²): Proportion of variance explained. Value between 0 and 1. e.g. “91% of change in y value can be explained by x.” Good for comparing multiple models. For any given model, the value is arbitrary.
S
S: The standard deviation of a sample. Compare to σ, that of a population.
sample: A set of values of size n drawn from a population.
sampling distribution: Assembled by plotting a particular parameter from many samples taken from a population. e.g. Plot the means of many samples to create a sampling distribution and it will look close to normal.
sigma, lower-case (σ): The standard deviation parameter of a population distribution.
sigma, upper-case (Σ): The sum of the elements of a population distribution.
standardization: To put features on a similar scale for comparison. Does not change the shape of a distribution. Compare to transformation.
T
t distribution: Inspires less confidence than a normal distribution due its higher standard deviation and resulting “fatter” tails. Used when a population’s standard deviation is not known. Its confidence interval is measured with a t-value in the absence of a standard deviation-derived z-score. As its degree of freedom rises, its shape in fact approaches a normal distribution.
test statistic: Can be a P-value, t-value, z-score, &c.. Used to measure where along the distribution the value taken from your study will fall. Compare to the critical value you chose to see if the value falls into the rejection region.
transformation: To normalize a distribution, make it more symmetrical, reduce its tails, &c. Compare to standardization.
t-value: A t-distribution’s equivalent to a z score for a normal distribution. The rejection of a t distribution into which a t-value may fall is larger than the rejection region of a normal distribution, so rejection of a null hypothesis based on t inspires less confidence.
two-tailed test: An experiment that tests whether a value for a treatment group is not equal to the value for a control group. The “tails” are the rejection regions on either end of the distribution. The test statistic must fall into one of the tails in order to reject the null hypothesis.
V
variance: Standard deviation squared. σ².
Z
z score: Measures the upper and lower limits of a confidence interval by how many standard deviations away from the mean they are. A test statistic used for a proportion, or for the mean of a population when that population’s standard deviation is known. Usual range is between -2 and 2 (95% of the values in a normal distribution.) A score outside that range must often lead you to reject the null hypothesis.
Last updated