scrapbook
  • "Unorganized" Notes
  • The Best Public Datasets for Machine Learning and Data Science
  • Practice Coding
  • plaid-API project
  • Biotech
    • Machine Learning vs. Deep Learning
  • Machine Learning for Computer Graphics
  • Books (on GitHub)
  • Ideas/Thoughts
  • Ziva for feature animation: Stylized simulation and machine learning-ready workflows
  • Tools
  • ðŸŠķmath
    • Papers
    • Math for ML (coursera)
      • Linear Algebra
        • Wk1
        • Wk2
        • Wk3
        • Wk4
        • Wk5
      • Multivariate Calculus
    • Improving your Algorithms & Data Structure Skills
    • Algorithms
    • Algorithms (MIT)
      • Lecture 1: Algorithmic Thinking, Peak Finding
    • Algorithms (khan academy)
      • Binary Search
      • Asymptotic notation
      • Sorting
      • Insertion sort
      • Recursion
      • Solve Hanoi recursively
      • Merge Sort
      • Representing graphs
      • The breadth-first search algorithm
      • Breadth First Search in JavaScript
      • Breadth-first vs Depth-first Tree Traversal in Javascript
    • Algorithms (udacity)
      • Social Network
    • Udacity
      • Linear Algebra Refresher /w Python
    • math-notes
      • functions
      • differential calculus
      • derivative
      • extras
      • Exponentials & logarithms
      • Trigonometry
    • Probability (MIT)
      • Unit 1
        • Probability Models and Axioms
        • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
    • Statistics and probability (khan academy)
      • Analyzing categorical data
      • Describing and comparing distributions
      • Outliers Definition
      • Mean Absolute Deviation (MAD)
      • Modeling data distribution
      • Exploring bivariate numerical data
      • Study Design
      • Probability
      • Counting, permutations, and combinations
      • Binomial variables
        • Binomial Distribution
        • Binomial mean and standard deviation formulas
        • Geometric random variable
      • Central Limit Theorem
      • Significance Tests (hypothesis testing)
    • Statistics (hackerrank)
      • Mean, Medium, Mode
      • Weighted Mean
      • Quartiles
      • Standard Deviation
      • Basic Probability
      • Conditional Probability
      • Permutations & Combinations
      • Binomial Distribution
      • Negative Binomial
      • Poisson Distribution
      • Normal Distribution
      • Central Limit Theorem
      • Important Concepts in Bayesian Statistics
  • ðŸ“―ïļPRODUCT
    • Product Strategy
    • Product Design
    • Product Development
    • Product Launch
  • ðŸ‘Ļ‍ðŸ’ŧcoding
    • of any interest
    • Maya API
      • Python API
    • Python
      • Understanding Class Inheritance in Python 3
      • 100+ Python challenging programming exercises
      • coding
      • Iterables vs. Iterators vs. Generators
      • Generator Expression
      • Stacks (LIFO) / Queues (FIFO)
      • What does -1 mean in numpy reshape?
      • Fold Left and Right in Python
      • Flatten a nested list of lists
      • Flatten a nested dictionary
      • Traverse A Tree
      • How to Implement Breadth-First Search
      • Breadth First Search
        • Level Order Tree Traversal
        • Breadth First Search or BFS for a Graph
        • BFS for Disconnected Graph
      • Trees and Tree Algorithms
      • Graph and its representations
      • Graph Data Structure Interview Questions
      • Graphs in Python
      • GitHub Repo's
    • Python in CG Production
    • GLSL/HLSL Shading programming
    • Deep Learning Specialization
      • Neural Networks and Deep Learning
      • Untitled
      • Untitled
      • Untitled
    • TensorFlow for AI, ML, and DL
      • Google ML Crash Course
      • TensorFlow C++ API
      • TensorFlow - coursera
      • Notes
      • An Introduction to different Types of Convolutions in Deep Learning
      • One by One [ 1 x 1 ] Convolution - counter-intuitively useful
      • SqueezeNet
      • Deep Compression
      • An Overview of ResNet and its Variants
      • Introducing capsule networks
      • What is a CapsNet or Capsule Network?
      • Xception
      • TensorFlow Eager
    • GitHub
      • Project README
    • Agile - User Stories
    • The Open-Source Data Science Masters
    • Coding Challenge Websites
    • Coding Interview
      • leetcode python
      • Data Structures
        • Arrays
        • Linked List
        • Hash Tables
        • Trees: Basic
        • Heaps, Stacks, Queues
        • Graphs
          • Shortest Path
      • Sorting & Searching
        • Depth-First Search & Breadth-First Search
        • Backtracking
        • Sorting
      • Dynamic Programming
        • Dynamic Programming: Basic
        • Dynamic Programming: Advanced
    • spaCy
    • Pandas
    • Python Packages
    • Julia
      • jupyter
    • macos
    • CPP
      • Debugging
      • Overview of memory management problems
      • What are lvalues and rvalues?
      • The Rule of Five
      • Concurrency
      • Avoiding Data Races
      • Mutex
      • The Monitor Object Pattern
      • Lambdas
      • Maya C++ API Programming Tips
      • How can I read and parse CSV files in C++?
      • Cpp NumPy
    • Advanced Machine Learning
      • Wk 1
      • Untitled
      • Untitled
      • Untitled
      • Untitled
  • data science
    • Resources
    • Tensorflow C++
    • Computerphile
      • Big Data
    • Google ML Crash Course
    • Kaggle
      • Data Versioning
      • The Basics of Rest APIs
      • How to Make an API
      • How to deploying your API
    • Jupiter Notebook Tips & Tricks
      • Jupyter
    • Image Datasets Notes
    • DS Cheatsheets
      • Websites & Blogs
      • Q&A
      • Strata
      • Data Visualisation
      • Matplotlib etc
      • Keras
      • Spark
      • Probability
      • Machine Learning
        • Fast Computation of AUC-ROC score
    • Data Visualisation
    • fast.ai
      • deep learning
      • How to work with Jupyter Notebook on a remote machine (Linux)
      • Up and Running With Fast.ai and Docker
      • AWS
    • Data Scientist
    • ML for Beginners (Video)
    • ML Mastery
      • Machine Learning Algorithms
      • Deep Learning With Python
    • Linear algebra cheat sheet for deep learning
    • DL_ML_Resources
    • Awesome Machine Learning
    • web scraping
    • SQL Style Guide
    • SQL - Tips & Tricks
  • ðŸ’ĄIdeas & Thoughts
    • Outdoors
    • Blog
      • markdown
      • How to survive your first day as an On-set VFX Supervisor
    • Book Recommendations by Demi Lee
  • career
    • Skills
    • learn.co
      • SQL
      • Distribution
      • Hypothesis Testing Glossary
      • Hypothesis Tests
      • Hypothesis & AB Testing
      • Combinatorics Continued and Maximum Likelihood Estimation
      • Bayesian Classification
      • Resampling and Monte Carlo Simulation
      • Extensions To Linear Models
      • Time Series
      • Distance Metrics
      • Graph Theory
      • Logistic Regression
      • MLE (Maximum Likelihood Estimation)
      • Gradient Descent
      • Decision Trees
      • Ensemble Methods
      • Spark
      • Machine Learning
      • Deep Learning
        • Backpropagation - math notation
        • PRACTICE DATASETS
        • Big Data
      • Deep Learning Resources
      • DL Datasets
      • DL Tutorials
      • Keras
      • Word2Vec
        • Word2Vec Tutorial Part 1 - The Skip-Gram Model
        • Word2Vec Tutorial Part 2 - Negative Sampling
        • An Intuitive Explanation of Convolutional Neural Networks
      • Mod 4 Project
        • Presentation
      • Mod 5 Project
      • Capstone Project Notes
        • Streaming large training and test files into Tensorflow's DNNClassifier
    • Carrier Prep
      • The Job Search
        • Building a Strong Job Search Foundation
        • Key Traits of Successful Job Seekers
        • Your Job Search Mindset
        • Confidence
        • Job Search Action Plan
        • CSC Weekly Activity
        • Managing Your Job Search
      • Your Online Presence
        • GitHub
      • Building Your Resume
        • Writing Your Resume Summary
        • Technical Experience
      • Effective Networking
        • 30 Second Elevator Pitch
        • Leveraging Your Network
        • Building an Online Network
        • Linkedin For Research And Networking
        • Building An In-Person Network
        • Opening The Line Of Communication
      • Applying to Jobs
        • Applying To Jobs Online
        • Cover Letters
      • Interviewing
        • Networking Coffees vs Formal Interviews
        • The Coffee Meeting/ Informational Interview
        • Communicating With Recruiters And HR Professional
        • Research Before an Interview
        • Preparing Questions for Interviews
        • Phone And Video/Virtual Interviews
        • Cultural/HR Interview Questions
        • The Salary Question
        • Talking About Apps/Projects You Built
        • Sending Thank You's After an Interview
      • Technical Interviewing
        • Technical Interviewing Formats
        • Code Challenge Best Practices
        • Technical Interviewing Resources
      • Communication
        • Following Up
        • When You Haven't Heard From an Employer
      • Job Offers
        • Approaching Salary Negotiations
      • Staying Current in the Tech Industry
      • Module 6 Post Work
      • Interview Prep
  • projects
    • Text Classification
    • TERRA-REF
    • saildrone
  • Computer Graphics
  • AI/ML
  • 3deeplearning
    • Fast and Deep Deformation Approximations
    • Compress and Denoise MoCap with Autoencoders
    • ‘Fast and Deep Deformation Approximations’ Implementation
    • Running a NeuralNet live in Maya in a Python DG Node
    • Implement a Substance like Normal Map Generator with a Convolutional Network
    • Deploying Neural Nets to the Maya C++ API
  • Tools/Plugins
  • AR/VR
  • Game Engine
  • Rigging
    • Deformer Ideas
    • Research
    • brave rabbit
    • Useful Rigging Links
  • Maya
    • Optimizing Node Graph for Parallel Evaluation
  • Houdini
    • Stuff
    • Popular Built-in VEX Attributes (Global Variables)
Powered by GitBook
On this page
  • Linear Regression
  • Squared loss: a popular loss function
  • Reducing Loss
  • Weight Initialization
  • SGD & Mini-Batch Gradient Descent
  • Math
  • TensorFlow API Hierarchy
  1. data science

Google ML Crash Course

PreviousBig DataNextKaggle

Last updated 5 years ago

Linear Regression

True, the line doesn't pass through every dot, but the line does clearly show the relationship between chirps and temperature. Using the equation for a line, you could write down this relationship as follows:

where:

By convention in machine learning, you'll write the equation for a model slightly differently:

where:

Although this model uses only one feature, a more sophisticated model might rely on multiple features, each having a separate weight (w1, w2, etc.). For example, a model that relies on three features might look as follows:

Squared loss: a popular loss function

The linear regression models we'll examine here use a loss function called squared loss (also known as L2 loss). The squared loss for a single example is as follows:

  = the square of the difference between the label and the prediction
  = (observation - prediction(x))^2
  = (y - y')^2

Mean square error (MSE) is the average squared loss per example over the whole dataset. To calculate MSE, sum up all the squared losses for individual examples and then divide by the number of examples:

where:

Although MSE is commonly-used in machine learning, it is neither the only practical loss function nor the best loss function for all circumstances.

Reducing Loss

Weight Initialization

  • For convex problems, weights can start anywhere (say, all 0s)

    • Convex: think of a bowl shape

    • Just one minimum

  • Foreshadowing: not true for neural nets

    • Non-convex: think of an egg crate

    • More than one minimum

    • Strong dependency on initial values

SGD & Mini-Batch Gradient Descent

  • Could compute gradient over entire data set on each step, but this turns out to be unnecessary

  • Computing gradient on small data samples works well

    • On every step, get a new random sample

  • Stochastic Gradient Descent: one example at a time

  • Mini-Batch Gradient Descent: batches of 10-1000

    • Loss & gradients are averaged over the batch

Math

Note that TensorFlow handles all the gradient computations for you, so you don't actually have to understand the calculus provided here.

Partial derivatives

A multivariable function is a function with more than one argument, such as:

The partial derivative f with respect to x, denoted as follows:

Intuitively, a partial derivative tells you how much the function changes when you perturb one variable a bit. In the preceding example:

In machine learning, partial derivatives are mostly used in conjunction with the gradient of a function.

Gradients

The gradient of a function, denoted as follows, is the vector of partial derivatives with respect to all of the independent variables:

For instance, if:

then:

Note the following:

∇f

Points in the direction of greatest increase of the function.

−∇f

Points in the direction of greatest decrease of the function.

In machine learning, gradients are used in gradient descent. We often have a loss function of many variables that we are trying to minimize, and we try to do this by following the negative of the gradient of the function.

TensorFlow API Hierarchy

y=mx+by = mx + by=mx+b

yyy is the temperature in Celsius—the value we're trying to predict.

mmm is the slope of the line.

xxx is the number of chirps per minute—the value of our input feature.

bbb is the yyy-intercept.

yâ€ē=b+w1x1y' = b + w_1x_1yâ€ē=b+w1​x1​

yâ€ēyâ€ēyâ€ē is the predicted (a desired output).

bbb is the bias (the y-intercept), sometimes referred to as w0w_0w0​.

w1w_1w1​ is the weight of feature 1. Weight is the same concept as the "slope" mmm in the traditional equation of a line.

x1x_1x1​ is a (a known input).

To infer (predict) the temperature yâ€ēyâ€ēyâ€ē for a new chirps-per-minute value x1x_1x1​, just substitute the x1x_1x1​ value into this model.

yâ€ē=b+w1x1+w2x2+w3x3yâ€ē=b+w_1x_1+w_2x_2+w_3x_3yâ€ē=b+w1​x1​+w2​x2​+w3​x3​
MSE=1N∑(x,y)∈D(y−prediction(x))2MSE=\frac{1}N ∑_{(x,y)∈D}(y−prediction(x))^2MSE=N1​(x,y)∈D∑​(y−prediction(x))2

(x,y)(x,y)(x,y) is an example in which

xxx is the set of features (for example, chirps/minute, age, gender) that the model uses to make predictions.

yyy is the example's label (for example, temperature).

prediction(x)prediction(x)prediction(x) is a function of the weights and bias in combination with the set of features xxx.

DDD is a data set containing many labeled examples, which are (x,y)(x,y)(x,y) pairs.

NNN is the number of examples in DDD.

f(x,y)=e2ysin⁥(x)f(x,y)=e^{2y}sin⁥(x)f(x,y)=e2ysin⁥(x)
∂f∂x\frac{∂f}{∂x}∂x∂f​

is the derivative of fff considered as a function of xxx alone. To find the following:

∂f∂x\frac{∂f}{∂x} ∂x∂f​

you must hold yyy constant (so fff is now a function of one variable xxx), and take the regular derivative of fff with respect to xxx. For example, when yyy is fixed at 1, the preceding function becomes:

f(x)=e2sin⁥(x)f(x)=e^2 sin⁥(x)f(x)=e2sin⁥(x)

This is just a function of one variable xxx, whose derivative is:

e2cos⁥(x)e^2cos⁥(x)e2cos⁥(x)

In general, thinking of yyy as fixed, the partial derivative of fff with respect to xxx is calculated as follows:

∂f∂x(x,y)=e2ycos⁡(x)\frac{∂f}{∂x}(x,y)=e^{2y}cos⁡(x)∂x∂f​(x,y)=e2ycos⁡(x)

Similarly, if we hold xxx fixed instead, the partial derivative of fff with respect to yyy is:

∂f∂y(x,y)=2e2ysin⁡(x)\frac{∂f}{∂y}(x,y)=2e^{2y}sin⁡(x)∂y∂f​(x,y)=2e2ysin⁡(x)
∂f∂x(0,1)=e2≈7.4\frac{∂f}{∂x}(0,1)=e2≈7.4∂x∂f​(0,1)=e2≈7.4

So when you start at (0,1)(0,1)(0,1), hold y constant, and move xxx a little, fff changes by about 7.4 times the amount that you changed xxx.

a=ba = ba=b
f(x,y)=e2ysin⁥(x)f(x,y)=e^{2y} sin⁥(x)f(x,y)=e2ysin⁥(x)
∇f(x,y)=(∂f∂x(x,y),∂f∂y(x,y))=(e2ycos⁡(x),2e2ysin⁡(x))∇f(x,y)=(\frac{∂f}{∂x}(x,y), \frac{∂f}{∂y}(x,y))=(e^{2y} cos⁡(x), 2e^{2y} sin⁡(x))∇f(x,y)=(∂x∂f​(x,y),∂y∂f​(x,y))=(e2ycos⁡(x),2e2ysin⁡(x))

The number of dimensions in the vector is equal to the number of variables in the formula for fff; in other words, the vector falls within the domain space of the function. For instance, the graph of the following function f(x,y)f(x,y)f(x,y):

f(x,y)=4+(x−2)2+2y2f(x,y)=4+(x−2)^2 + 2y^2f(x,y)=4+(x−2)2+2y2

when viewed in three dimensions with z=f(x,y)z=f(x,y)z=f(x,y) looks like a valley with a minimum at (2,0,4)(2,0,4)(2,0,4):

The gradient of f(x,y)f(x,y)f(x,y) is a two-dimensional vector that tells you in which (x,y)(x,y)(x,y) direction to move for the maximum increase in height. Thus, the negative of the gradient moves you in the direction of maximum decrease in height. In other words, the negative of the gradient vector points into the valley.

label
feature
https://developers.google.com/machine-learning/crash-course/