ML for Beginners (Video)
Section 1: Intro to ML
Types of Learning
Predictive -> Supervised
Descriptive -> Unsupervised (We don't know the outcome)
Term Comparison
Machine Learning | Statistics |
network, graphs, algorithms | model |
weights | parameters |
learning | fitting |
supervised learning | regression/classification |
unsupervised learning | density estimation, clustering |
Classification
Inputs -> Algorithm -> Class (Qualitative Ouput)
Prediction Function (y = g(x))
Regression
Input -> Algorithm -> Number (Quantitative Output)
Function Fitting (y = mx + b)
Unsupervised Learning
No labeled data.
Goal: find regularities in the input
Density Estimation (Statisitcs)
The input space is structured; as a result, certain patterns occur more often than others.
Clustering (ML)
Method for density estimation. Aim is to find clusters or groupings of inputs.
Multivariate Calculus
Best mechanism for talking about smooth changes algebraically.
Optimization Problems (minimize error)
Probability Measurement (integration)
Optimization via Gradient Descent
Probability Calculations
Bayesian Inference
Statistics and Probability Theory
We need statistics to...
deal with uncertain events
mathematical formulations for probabilities
estimate probabilities from data
The more data you have the better.
Linear Algebra
Minimum Linear Algebra Knowledge for ML
Notation
Knowing linear algebra notation is essential to understand the algorithm structure referenced in papers, books etc
Operations
Working at the next level of abstraction in vectors and matrices is essential for ML. Learn to apply simple operations like adding, multiplying, inverting, transposing, etc. matrices and vectors.
Recommended Resources
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) by Trevor Hastie (Author), Robert Tibshirani (Author), Jerome Friedman (Author)
Information Theory, Inference and Learning Algorithms by David J. C. MacKay (Author)
Section 2: Supervised Learning (part 1)
...inferring a function from labeled training data
Outputs
Qualitative (Classification)
Quantitative (Regression)
Terminology
Generalization - how well our hypothesis will correctly classify future examples that are not part of the training set
Most Specific Hypothesis (S) - the tightest rectangle that includes all of the positive examples and none of the negative examples
Most General Hypothesis (G) - the largest rectangle that includes all the positive examples and none of the negative examples
Doubt - a case that falls in between the most specific hypothesis (S) and the most general hypothesis (G)
Linear Methods for Classification
Linear Models
Least Squares
Nearest Neighbors (kNN)
Math Notation
Least Squares -> Residual Sum os Squares (RSS)
Nearest Neighbor (kNN)
Linear Methods for Regression
Goal: learn a numerical function
Inputs
quantitative
transformations of quantitative inputs (log, square-root, square)
polynomial representations (basis expansions)
interactions between variables
Data Distribution Assumptions
Inputs are fixed, or non random
Observations are uncorrelated and have constant variance
Support Vector Machines (SVM)
(Kernel Machines)
It is a discriminatnt-based methods
The weight vector can be written in terms of a subset of the training set (the support vectors)
Kernel functions can be used to solve nonlinear cases
Present a convex optimization problem
Vectorial Kernels
polynomials of degree q
radial-basis functions (use cross validation)
sigmoidal functions
Basis Expansions
The Big Idea!
Augment or replace the vector of inputs with additional variables, which are transformations of the inputs, and then use linear models in this new space of derived input features.
Linear Basis Expansion
Piecewise Polynomials and Splines
Divide the domain of X into intervals
Represent with a seperate basis function in each interval
Model Selection Procedures
Inductive Bias
Assuming linear function
Minimizing Squared Error
Chossing the right bias is called model selection
Last updated