ML for Beginners (Video)

Section 1: Intro to ML

Types of Learning

Predictive -> Supervised
Descriptive -> Unsupervised (We don't know the outcome)

Term Comparison

Machine Learning

Statistics

network, graphs, algorithms

model

weights

parameters

learning

fitting

supervised learning

regression/classification

unsupervised learning

density estimation, clustering

Classification

Inputs -> Algorithm -> Class (Qualitative Ouput)

Prediction Function (y = g(x))

Regression

Input -> Algorithm -> Number (Quantitative Output)

Function Fitting (y = mx + b)

Unsupervised Learning

No labeled data.

Goal: find regularities in the input

Density Estimation (Statisitcs)

The input space is structured; as a result, certain patterns occur more often than others.

Clustering (ML)

Method for density estimation. Aim is to find clusters or groupings of inputs.

Multivariate Calculus

Best mechanism for talking about smooth changes algebraically.

Optimization Problems (minimize error)
Probability Measurement (integration)

Optimization via Gradient Descent

Probability Calculations

Bayesian Inference

Statistics and Probability Theory

We need statistics to...

deal with uncertain events
mathematical formulations for probabilities
estimate probabilities from data

The more data you have the better.

Linear Algebra

Minimum Linear Algebra Knowledge for ML

Notation
- Knowing linear algebra notation is essential to understand the algorithm structure referenced in papers, books etc
Operations
- Working at the next level of abstraction in vectors and matrices is essential for ML. Learn to apply simple operations like adding, multiplying, inverting, transposing, etc. matrices and vectors.

Recommended Resources

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) by Trevor Hastie (Author), Robert Tibshirani (Author), Jerome Friedman (Author)

Information Theory, Inference and Learning Algorithms by David J. C. MacKay (Author)

Section 2: Supervised Learning (part 1)

...inferring a function from labeled training data
Outputs
Qualitative (Classification)
Quantitative (Regression)

Terminology

Generalization - how well our hypothesis will correctly classify future examples that are not part of the training set
Most Specific Hypothesis (S) - the tightest rectangle that includes all of the positive examples and none of the negative examples
Most General Hypothesis (G) - the largest rectangle that includes all the positive examples and none of the negative examples
Doubt - a case that falls in between the most specific hypothesis (S) and the most general hypothesis (G)

Linear Methods for Classification

Linear Models
- Least Squares
- Nearest Neighbors (kNN)

Math Notation

$X^T = (X_1, X_2, ... , X_n)$

$\hat{Y} = \hat{\beta_0} + \sum^n_{j=1} X_j\hat{\beta_j}$

$\hat{Y} = X^T \hat{\beta}$

Least Squares -> Residual Sum os Squares (RSS)

Nearest Neighbor (kNN)

$\hat{Y}(X) = \frac{1}{k} \sum_{x_i \in N_k(x)} y_i$

Linear Methods for Regression

Goal: learn a numerical function

Inputs

quantitative
transformations of quantitative inputs (log, square-root, square)
polynomial representations (basis expansions)
interactions between variables

Data Distribution Assumptions

Inputs $x$ are fixed, or non random
Observations $y$ are uncorrelated and have constant variance

Support Vector Machines (SVM)

(Kernel Machines)

It is a discriminatnt-based methods
The weight vector can be written in terms of a subset of the training set (the support vectors)
Kernel functions can be used to solve nonlinear cases
Present a convex optimization problem

Vectorial Kernels

polynomials of degree q
radial-basis functions (use cross validation)
sigmoidal functions

Basis Expansions

The Big Idea!

Augment or replace the vector of inputs with additional variables, which are transformations of the inputs, and then use linear models in this new space of derived input features.

Linear Basis Expansion

$f(X) = \sum^M_{m=1} \beta_m h_m (X)$

Piecewise Polynomials and Splines

Divide the domain of X into intervals
Represent $f(X)$ with a seperate basis function in each interval

Model Selection Procedures

Inductive Bias

Assuming linear function
Minimizing Squared Error

Chossing the right bias is called model selection

PreviousData Scientist NextML Mastery

Last updated 6 years ago