ML for Beginners (Video)

Section 1: Intro to ML

Types of Learning

  • Predictive -> Supervised

  • Descriptive -> Unsupervised (We don't know the outcome)

Term Comparison

Machine Learning

Statistics

network, graphs, algorithms

model

weights

parameters

learning

fitting

supervised learning

regression/classification

unsupervised learning

density estimation, clustering

Classification

Inputs -> Algorithm -> Class (Qualitative Ouput)

Prediction Function (y = g(x))

Regression

Input -> Algorithm -> Number (Quantitative Output)

Function Fitting (y = mx + b)

Unsupervised Learning

No labeled data.

Goal: find regularities in the input

Density Estimation (Statisitcs)

The input space is structured; as a result, certain patterns occur more often than others.

Clustering (ML)

Method for density estimation. Aim is to find clusters or groupings of inputs.

Multivariate Calculus

Best mechanism for talking about smooth changes algebraically.

  • Optimization Problems (minimize error)

  • Probability Measurement (integration)

Optimization via Gradient Descent

Probability Calculations

Bayesian Inference

Statistics and Probability Theory

We need statistics to...

  • deal with uncertain events

  • mathematical formulations for probabilities

  • estimate probabilities from data

The more data you have the better.

Linear Algebra

Minimum Linear Algebra Knowledge for ML

  • Notation

    • Knowing linear algebra notation is essential to understand the algorithm structure referenced in papers, books etc

  • Operations

    • Working at the next level of abstraction in vectors and matrices is essential for ML. Learn to apply simple operations like adding, multiplying, inverting, transposing, etc. matrices and vectors.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) by Trevor Hastie (Author), Robert Tibshirani (Author), Jerome Friedman (Author)

Information Theory, Inference and Learning Algorithms by David J. C. MacKay (Author)

Section 2: Supervised Learning (part 1)

...inferring a function from labeled training data

Outputs

  • Qualitative (Classification)

  • Quantitative (Regression)

Terminology

  • Generalization - how well our hypothesis will correctly classify future examples that are not part of the training set

  • Most Specific Hypothesis (S) - the tightest rectangle that includes all of the positive examples and none of the negative examples

  • Most General Hypothesis (G) - the largest rectangle that includes all the positive examples and none of the negative examples

  • Doubt - a case that falls in between the most specific hypothesis (S) and the most general hypothesis (G)

Linear Methods for Classification

  • Linear Models

    • Least Squares

    • Nearest Neighbors (kNN)

Math Notation

XT=(X1,X2,...,Xn)X^T = (X_1, X_2, ... , X_n)

Y^=β0^+j=1nXjβj^\hat{Y} = \hat{\beta_0} + \sum^n_{j=1} X_j\hat{\beta_j}

Y^=XTβ^\hat{Y} = X^T \hat{\beta}

Least Squares -> Residual Sum os Squares (RSS)

Nearest Neighbor (kNN)

Y^(X)=1kxiNk(x)yi\hat{Y}(X) = \frac{1}{k} \sum_{x_i \in N_k(x)} y_i

Linear Methods for Regression

Goal: learn a numerical function

Inputs

  • quantitative

  • transformations of quantitative inputs (log, square-root, square)

  • polynomial representations (basis expansions)

  • interactions between variables

Data Distribution Assumptions

  • Inputs xx are fixed, or non random

  • Observations yy are uncorrelated and have constant variance

Support Vector Machines (SVM)

(Kernel Machines)

  • It is a discriminatnt-based methods

  • The weight vector can be written in terms of a subset of the training set (the support vectors)

  • Kernel functions can be used to solve nonlinear cases

  • Present a convex optimization problem

Vectorial Kernels

  • polynomials of degree q

  • radial-basis functions (use cross validation)

  • sigmoidal functions

Basis Expansions

The Big Idea!

Augment or replace the vector of inputs with additional variables, which are transformations of the inputs, and then use linear models in this new space of derived input features.

Linear Basis Expansion

f(X)=m=1Mβmhm(X)f(X) = \sum^M_{m=1} \beta_m h_m (X)

Piecewise Polynomials and Splines

  • Divide the domain of X into intervals

  • Represent f(X)f(X) with a seperate basis function in each interval

Model Selection Procedures

Inductive Bias

  • Assuming linear function

  • Minimizing Squared Error

Chossing the right bias is called model selection

Last updated