ML for Beginners (Video)
Section 1: Intro to ML
Types of Learning
- Predictive -> Supervised 
- Descriptive -> Unsupervised (We don't know the outcome) 
Term Comparison
Machine Learning
Statistics
network, graphs, algorithms
model
weights
parameters
learning
fitting
supervised learning
regression/classification
unsupervised learning
density estimation, clustering
Classification
Inputs -> Algorithm -> Class (Qualitative Ouput)
Prediction Function (y = g(x))
Regression
Input -> Algorithm -> Number (Quantitative Output)
Function Fitting (y = mx + b)
Unsupervised Learning
No labeled data.
Goal: find regularities in the input
Density Estimation (Statisitcs)
The input space is structured; as a result, certain patterns occur more often than others.
Clustering (ML)
Method for density estimation. Aim is to find clusters or groupings of inputs.
Multivariate Calculus
Best mechanism for talking about smooth changes algebraically.
- Optimization Problems (minimize error) 
- Probability Measurement (integration) 
Optimization via Gradient Descent

Probability Calculations

Bayesian Inference

Statistics and Probability Theory
We need statistics to...
- deal with uncertain events 
- mathematical formulations for probabilities 
- estimate probabilities from data 



The more data you have the better.
Linear Algebra

Minimum Linear Algebra Knowledge for ML
- Notation - Knowing linear algebra notation is essential to understand the algorithm structure referenced in papers, books etc 
 
- Operations - Working at the next level of abstraction in vectors and matrices is essential for ML. Learn to apply simple operations like adding, multiplying, inverting, transposing, etc. matrices and vectors. 
 
Recommended Resources
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) by Trevor Hastie (Author), Robert Tibshirani (Author), Jerome Friedman (Author)
Information Theory, Inference and Learning Algorithms by David J. C. MacKay (Author)
Section 2: Supervised Learning (part 1)
...inferring a function from labeled training data
Outputs
Qualitative (Classification)
Quantitative (Regression)
Terminology
- Generalization - how well our hypothesis will correctly classify future examples that are not part of the training set 
- Most Specific Hypothesis (S) - the tightest rectangle that includes all of the positive examples and none of the negative examples 
- Most General Hypothesis (G) - the largest rectangle that includes all the positive examples and none of the negative examples 
- Doubt - a case that falls in between the most specific hypothesis (S) and the most general hypothesis (G) 
Linear Methods for Classification
- Linear Models - Least Squares 
- Nearest Neighbors (kNN) 
 
Math Notation 
Least Squares -> Residual Sum os Squares (RSS)
Nearest Neighbor (kNN)
Linear Methods for Regression
Goal: learn a numerical function
Inputs
- quantitative 
- transformations of quantitative inputs (log, square-root, square) 
- polynomial representations (basis expansions) 
- interactions between variables 
Data Distribution Assumptions
- Inputs are fixed, or non random 
- Observations are uncorrelated and have constant variance 
Support Vector Machines (SVM)
(Kernel Machines)
- It is a discriminatnt-based methods 
- The weight vector can be written in terms of a subset of the training set (the support vectors) 
- Kernel functions can be used to solve nonlinear cases 
- Present a convex optimization problem 
Vectorial Kernels
- polynomials of degree q 
- radial-basis functions (use cross validation) 
- sigmoidal functions 
Basis Expansions
The Big Idea!
Augment or replace the vector of inputs with additional variables, which are transformations of the inputs, and then use linear models in this new space of derived input features.
Linear Basis Expansion
Piecewise Polynomials and Splines
- Divide the domain of X into intervals 
- Represent with a seperate basis function in each interval 

Model Selection Procedures

Inductive Bias
- Assuming linear function 
- Minimizing Squared Error 
Chossing the right bias is called model selection


Last updated
