scrapbook
  • "Unorganized" Notes
  • The Best Public Datasets for Machine Learning and Data Science
  • Practice Coding
  • plaid-API project
  • Biotech
    • Machine Learning vs. Deep Learning
  • Machine Learning for Computer Graphics
  • Books (on GitHub)
  • Ideas/Thoughts
  • Ziva for feature animation: Stylized simulation and machine learning-ready workflows
  • Tools
  • 🪶math
    • Papers
    • Math for ML (coursera)
      • Linear Algebra
        • Wk1
        • Wk2
        • Wk3
        • Wk4
        • Wk5
      • Multivariate Calculus
    • Improving your Algorithms & Data Structure Skills
    • Algorithms
    • Algorithms (MIT)
      • Lecture 1: Algorithmic Thinking, Peak Finding
    • Algorithms (khan academy)
      • Binary Search
      • Asymptotic notation
      • Sorting
      • Insertion sort
      • Recursion
      • Solve Hanoi recursively
      • Merge Sort
      • Representing graphs
      • The breadth-first search algorithm
      • Breadth First Search in JavaScript
      • Breadth-first vs Depth-first Tree Traversal in Javascript
    • Algorithms (udacity)
      • Social Network
    • Udacity
      • Linear Algebra Refresher /w Python
    • math-notes
      • functions
      • differential calculus
      • derivative
      • extras
      • Exponentials & logarithms
      • Trigonometry
    • Probability (MIT)
      • Unit 1
        • Probability Models and Axioms
        • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
    • Statistics and probability (khan academy)
      • Analyzing categorical data
      • Describing and comparing distributions
      • Outliers Definition
      • Mean Absolute Deviation (MAD)
      • Modeling data distribution
      • Exploring bivariate numerical data
      • Study Design
      • Probability
      • Counting, permutations, and combinations
      • Binomial variables
        • Binomial Distribution
        • Binomial mean and standard deviation formulas
        • Geometric random variable
      • Central Limit Theorem
      • Significance Tests (hypothesis testing)
    • Statistics (hackerrank)
      • Mean, Medium, Mode
      • Weighted Mean
      • Quartiles
      • Standard Deviation
      • Basic Probability
      • Conditional Probability
      • Permutations & Combinations
      • Binomial Distribution
      • Negative Binomial
      • Poisson Distribution
      • Normal Distribution
      • Central Limit Theorem
      • Important Concepts in Bayesian Statistics
  • 📽️PRODUCT
    • Product Strategy
    • Product Design
    • Product Development
    • Product Launch
  • 👨‍💻coding
    • of any interest
    • Maya API
      • Python API
    • Python
      • Understanding Class Inheritance in Python 3
      • 100+ Python challenging programming exercises
      • coding
      • Iterables vs. Iterators vs. Generators
      • Generator Expression
      • Stacks (LIFO) / Queues (FIFO)
      • What does -1 mean in numpy reshape?
      • Fold Left and Right in Python
      • Flatten a nested list of lists
      • Flatten a nested dictionary
      • Traverse A Tree
      • How to Implement Breadth-First Search
      • Breadth First Search
        • Level Order Tree Traversal
        • Breadth First Search or BFS for a Graph
        • BFS for Disconnected Graph
      • Trees and Tree Algorithms
      • Graph and its representations
      • Graph Data Structure Interview Questions
      • Graphs in Python
      • GitHub Repo's
    • Python in CG Production
    • GLSL/HLSL Shading programming
    • Deep Learning Specialization
      • Neural Networks and Deep Learning
      • Untitled
      • Untitled
      • Untitled
    • TensorFlow for AI, ML, and DL
      • Google ML Crash Course
      • TensorFlow C++ API
      • TensorFlow - coursera
      • Notes
      • An Introduction to different Types of Convolutions in Deep Learning
      • One by One [ 1 x 1 ] Convolution - counter-intuitively useful
      • SqueezeNet
      • Deep Compression
      • An Overview of ResNet and its Variants
      • Introducing capsule networks
      • What is a CapsNet or Capsule Network?
      • Xception
      • TensorFlow Eager
    • GitHub
      • Project README
    • Agile - User Stories
    • The Open-Source Data Science Masters
    • Coding Challenge Websites
    • Coding Interview
      • leetcode python
      • Data Structures
        • Arrays
        • Linked List
        • Hash Tables
        • Trees: Basic
        • Heaps, Stacks, Queues
        • Graphs
          • Shortest Path
      • Sorting & Searching
        • Depth-First Search & Breadth-First Search
        • Backtracking
        • Sorting
      • Dynamic Programming
        • Dynamic Programming: Basic
        • Dynamic Programming: Advanced
    • spaCy
    • Pandas
    • Python Packages
    • Julia
      • jupyter
    • macos
    • CPP
      • Debugging
      • Overview of memory management problems
      • What are lvalues and rvalues?
      • The Rule of Five
      • Concurrency
      • Avoiding Data Races
      • Mutex
      • The Monitor Object Pattern
      • Lambdas
      • Maya C++ API Programming Tips
      • How can I read and parse CSV files in C++?
      • Cpp NumPy
    • Advanced Machine Learning
      • Wk 1
      • Untitled
      • Untitled
      • Untitled
      • Untitled
  • data science
    • Resources
    • Tensorflow C++
    • Computerphile
      • Big Data
    • Google ML Crash Course
    • Kaggle
      • Data Versioning
      • The Basics of Rest APIs
      • How to Make an API
      • How to deploying your API
    • Jupiter Notebook Tips & Tricks
      • Jupyter
    • Image Datasets Notes
    • DS Cheatsheets
      • Websites & Blogs
      • Q&A
      • Strata
      • Data Visualisation
      • Matplotlib etc
      • Keras
      • Spark
      • Probability
      • Machine Learning
        • Fast Computation of AUC-ROC score
    • Data Visualisation
    • fast.ai
      • deep learning
      • How to work with Jupyter Notebook on a remote machine (Linux)
      • Up and Running With Fast.ai and Docker
      • AWS
    • Data Scientist
    • ML for Beginners (Video)
    • ML Mastery
      • Machine Learning Algorithms
      • Deep Learning With Python
    • Linear algebra cheat sheet for deep learning
    • DL_ML_Resources
    • Awesome Machine Learning
    • web scraping
    • SQL Style Guide
    • SQL - Tips & Tricks
  • 💡Ideas & Thoughts
    • Outdoors
    • Blog
      • markdown
      • How to survive your first day as an On-set VFX Supervisor
    • Book Recommendations by Demi Lee
  • career
    • Skills
    • learn.co
      • SQL
      • Distribution
      • Hypothesis Testing Glossary
      • Hypothesis Tests
      • Hypothesis & AB Testing
      • Combinatorics Continued and Maximum Likelihood Estimation
      • Bayesian Classification
      • Resampling and Monte Carlo Simulation
      • Extensions To Linear Models
      • Time Series
      • Distance Metrics
      • Graph Theory
      • Logistic Regression
      • MLE (Maximum Likelihood Estimation)
      • Gradient Descent
      • Decision Trees
      • Ensemble Methods
      • Spark
      • Machine Learning
      • Deep Learning
        • Backpropagation - math notation
        • PRACTICE DATASETS
        • Big Data
      • Deep Learning Resources
      • DL Datasets
      • DL Tutorials
      • Keras
      • Word2Vec
        • Word2Vec Tutorial Part 1 - The Skip-Gram Model
        • Word2Vec Tutorial Part 2 - Negative Sampling
        • An Intuitive Explanation of Convolutional Neural Networks
      • Mod 4 Project
        • Presentation
      • Mod 5 Project
      • Capstone Project Notes
        • Streaming large training and test files into Tensorflow's DNNClassifier
    • Carrier Prep
      • The Job Search
        • Building a Strong Job Search Foundation
        • Key Traits of Successful Job Seekers
        • Your Job Search Mindset
        • Confidence
        • Job Search Action Plan
        • CSC Weekly Activity
        • Managing Your Job Search
      • Your Online Presence
        • GitHub
      • Building Your Resume
        • Writing Your Resume Summary
        • Technical Experience
      • Effective Networking
        • 30 Second Elevator Pitch
        • Leveraging Your Network
        • Building an Online Network
        • Linkedin For Research And Networking
        • Building An In-Person Network
        • Opening The Line Of Communication
      • Applying to Jobs
        • Applying To Jobs Online
        • Cover Letters
      • Interviewing
        • Networking Coffees vs Formal Interviews
        • The Coffee Meeting/ Informational Interview
        • Communicating With Recruiters And HR Professional
        • Research Before an Interview
        • Preparing Questions for Interviews
        • Phone And Video/Virtual Interviews
        • Cultural/HR Interview Questions
        • The Salary Question
        • Talking About Apps/Projects You Built
        • Sending Thank You's After an Interview
      • Technical Interviewing
        • Technical Interviewing Formats
        • Code Challenge Best Practices
        • Technical Interviewing Resources
      • Communication
        • Following Up
        • When You Haven't Heard From an Employer
      • Job Offers
        • Approaching Salary Negotiations
      • Staying Current in the Tech Industry
      • Module 6 Post Work
      • Interview Prep
  • projects
    • Text Classification
    • TERRA-REF
    • saildrone
  • Computer Graphics
  • AI/ML
  • 3deeplearning
    • Fast and Deep Deformation Approximations
    • Compress and Denoise MoCap with Autoencoders
    • ‘Fast and Deep Deformation Approximations’ Implementation
    • Running a NeuralNet live in Maya in a Python DG Node
    • Implement a Substance like Normal Map Generator with a Convolutional Network
    • Deploying Neural Nets to the Maya C++ API
  • Tools/Plugins
  • AR/VR
  • Game Engine
  • Rigging
    • Deformer Ideas
    • Research
    • brave rabbit
    • Useful Rigging Links
  • Maya
    • Optimizing Node Graph for Parallel Evaluation
  • Houdini
    • Stuff
    • Popular Built-in VEX Attributes (Global Variables)
Powered by GitBook
On this page
  • Why So Weary?
  • The Basics
  • Glossary
  • A
  • B
  • C
  • D
  • F
  • G
  • H
  • M
  • N
  • O
  • P
  • R
  • S
  • T
  • V
  • Z
  1. career
  2. learn.co

Hypothesis Testing Glossary

Why So Weary?

When I try to read about statistics I get mired in the jargon. Even just moving past the phrase, “For a given parameterized distribution,” requires that I think about what it means for something to be “parameterized” and what a “distribution” is. I wind up reading in that plodding, word-by-word way that I might read a foreign language I happened to be studying. It’s exhausting.

Here I’ve gathered notes from the lessons I’ve done so far in my data science boot camp to define the salient terms.

The Basics

A hypothesis posits a relationship between two variables. A scientific experiment tests the hypothesis by comparing the value measured in a control group and the value measured in a treatment group. We assume that there is no relationship between the two variables. So, most of the time, the value taken from the treatment group will be close to the value taken from the control group. Once in a while, due to random chance, the value will be very different. If this chance is low enough, and our experiment shows it has happened, then we may conclude that our assumption about the lack of a relationship between the two variables was wrong. We cannot make the more satisfying conclusion that there is a relationship between the two, exactly; we can only conclude that there isn’t an absence of a relationship. The edifice of science rises by the gradual accumulation of cautious conclusions.

Glossary

Use CTRL + F (Windows) or CMD + F (Mac) to hop to the term you want.

A

alpha value (α): A critical value, often 0.05. Choose one before running your study. Compare it to the test statistic obtained from your study to decide whether to accept or reject the null hypothesis. Predicts the (hopefully small) probability of a type-1 error, a “false positive” result which leads you to reject the null hypothesis even though it is correct.

alternative hypothesis (H1): “There is a relationship between” the variables being compared. Accepted after rejection of the null hypothesis. In mathematical notation, will always have <, >, or !=.

B

beta value (β): A critical value. Predicts the probability of a type-2 error, a “false negative” result which leads you to accept the null hypothesis even though it is wrong. Compare to alpha.

C

Central Limit Theorem: A convenient miracle: the means of many samples taken from a non-normal distribution will themselves form a normal distribution. Allows for the prediction of the parameters of a population distribution through the much-easier prediction of the parameters of this, the sampling distribution.

confidence interval: A range of values in a sampling distribution. Somewhere within it lies the particular point estimate which matches the actual value of a parameter in the population.

confidence level: The percent value of a sampling distribution over which your desired confidence interval spreads. e.g. For a confidence level of 95%, the confidence interval will contain all values up to 1.96 standard deviations away from the mean of a normal distribution. (1.96 is also the z score here.) i.e. There is a 95% chance that the true mean of the population lies in this range. But all this depends on your knowing the standard deviation of the population, which is rare.

critical value: Separates the rejection region from the failure-to-reject region. Alpha or beta. Used in comparison to the test statistic.

D

degree of freedom (DDOF): Sample size minus 1. (n — 1.) The higher this is, the more normal a t distribution looks.

distribution: A set of values plotted as a curve. In probability, the values are the results of many trials of an event. The bulk of the values are often clustered symmetrically around the mean value while the rarer, outlying values are plotted along the “tails”. Can have discrete or continuous values; can have measurable “parameters”; can have a “normal” shape or a squatter “t” shape; can plot the values of a population or parameter values of samples from that population; &c.

F

failure-to-reject region: The set of values outside one or both tails of a distribution in which the test statistic must fall in order to accept the null hypothesis.

G

Gaussian distribution: A normal distribution.

H

hypothesis: A prediction about a quantitative relationship between variables. Can be a proportion, e.g. “Some of…,” “Most of…,” or a mean. In statistics, it is written as two statements: a null hypothesis, with “=”, predicting no relationship between variables, and an alternative hypothesis, with “<,” “>,” or “!=,” predicting a relationship.

M

maximum likelihood estimation (MLE): A method to find a parameter (e.g. mean or standard deviation) of a distribution by plotting a new distribution for the possible values of that parameter. At the maximum of this new distribution, where the tangent line will have a slope of 0 (like a wooden board balanced perfectly level on the peak of a mountain), lies the estimate for the most likely value of the parameter of the original distribution.

mu, lower-case (μ): The mean (average) parameter of a distribution.

N

n: The size of a sample taken from a population.

normal distribution: The “bell curve” or Gaussian distribution. Its well-known parameters allow for easy prediction of the probability of the values that fall on its curve. The distributions of many phenomena in the world take this shape, even, amazingly, the distribution of the means of many samples taken from a non-normal distribution. See Central Limit Theorem.

null hypothesis (H0) : “There is no relationship between” the variables being compared. In mathematical notation, will have an = sign. To “reject” it must lead you to accept the alternative hypothesis, which is as close as you can get to proving a relationship between two variables. “Failure to reject” is to accept the absence of a relationship. (If you’re looking for statistical significance, rejection is “good”; failure to reject is “bad.”)

O

one-tailed test: An experiment that tests whether a value for a treatment group will be less than or greater than the value for a control group. The “tail” is the rejection region, the small range of values at one end of the distribution in which the test statistic must fall in order to reject the null hypothesis.

P

parameter: A measure, e.g. mean or variance, of a population distribution.

point estimate: A parameter (e.g. mean or variance) of a sample. May be used as a proxy for the same parameter of the sample’s overall population. As more and more of these accumulate, they begin to form a probability distribution. e.g. Many point estimates of the mean of a population form a normal distribution which allows you to predict the actual mean of that population with a specific confidence interval. See Central Limit Theorem.

P-value: The probability associated with the test statistic. (A different value than the proportion “p” used to write the null and alternative hypotheses.) A value between 0 and 1 that you are incorrectly guessing that the null hypothesis is false. i.e. If your chosen alpha value is 0.05, then a P-value of 0.05 or smaller is enough to assume that you are correct in guessing that the null hypothesis is false, i.e. that there is a relationship between the two variables under observation.

R

rejection region: The set of values in one or both tails of a distribution in which the test statistic must fall in order to reject the null hypothesis.

R squared (R²): Proportion of variance explained. Value between 0 and 1. e.g. “91% of change in y value can be explained by x.” Good for comparing multiple models. For any given model, the value is arbitrary.

S

S: The standard deviation of a sample. Compare to σ, that of a population.

sample: A set of values of size n drawn from a population.

sampling distribution: Assembled by plotting a particular parameter from many samples taken from a population. e.g. Plot the means of many samples to create a sampling distribution and it will look close to normal.

sigma, lower-case (σ): The standard deviation parameter of a population distribution.

standardization: To put features on a similar scale for comparison. Does not change the shape of a distribution. Compare to transformation.

T

t distribution: Inspires less confidence than a normal distribution due its higher standard deviation and resulting “fatter” tails. Used when a population’s standard deviation is not known. Its confidence interval is measured with a t-value in the absence of a standard deviation-derived z-score. As its degree of freedom rises, its shape in fact approaches a normal distribution.

test statistic: Can be a P-value, t-value, z-score, &c.. Used to measure where along the distribution the value taken from your study will fall. Compare to the critical value you chose to see if the value falls into the rejection region.

transformation: To normalize a distribution, make it more symmetrical, reduce its tails, &c. Compare to standardization.

t-value: A t-distribution’s equivalent to a z score for a normal distribution. The rejection of a t distribution into which a t-value may fall is larger than the rejection region of a normal distribution, so rejection of a null hypothesis based on t inspires less confidence.

two-tailed test: An experiment that tests whether a value for a treatment group is not equal to the value for a control group. The “tails” are the rejection regions on either end of the distribution. The test statistic must fall into one of the tails in order to reject the null hypothesis.

V

variance: Standard deviation squared. σ².

Z

z score: Measures the upper and lower limits of a confidence interval by how many standard deviations away from the mean they are. A test statistic used for a proportion, or for the mean of a population when that population’s standard deviation is known. Usual range is between -2 and 2 (95% of the values in a normal distribution.) A score outside that range must often lead you to reject the null hypothesis.

PreviousDistributionNextHypothesis Tests

Last updated 6 years ago

A gorgeous explanation of MLE via StatQuest

sigma, upper-case (): The sum of the elements of a population distribution.

Σ