Neural Networks and Deep Learning
deeplearning.ai @ coursera
Scale of data, computation and algorithms.
Binary Classification
X.shape(n, m) and Y.shape(1, m) -> python
Logistic Regression
Gradient Descent
The negative sign should apply to the entire cost function (both terms in the summation).
Derivatives
Derivative => Slope of Lines
Computation Graph
Logistic Regression Gradient Descent
When you're implementing deep learning algorithms, you find that having explicit for loops in your code makes your algorithm run less efficiency. So, in the deep learning era, we would move to a bigger and bigger datasets, and so being able to implement your algorithms without using explicit for loops is really important and will help you to scale to much bigger datasets. So, it turns out that there are a set of techniques called vectorization techniques that allow you to get rid of these explicit for-loops in your code.
Derivation of
If you're curious, here is the derivation for
Note that in this part of the course, Andrew refers to .
By the chain rule:
We'll do the following: 1. solve for , then
Step 1:
We're taking the derivative with respect to a.
Remember that there is an additional in the last term when we take the derivative of with respect to .
We'll give both terms the same denominator:
Clean up the terms:
So now we have:
Step 2:
The derivative of a sigmoid has the form:
You can look up why this derivation is of this form. For example, google "derivative of a sigmoid", and you can see the derivation in detail, such as in this article.
Recall that , because we defined "a", the activation, as the output of the sigmoid activation function.
So we can substitute into the formula to get:
(Continue to the next page to see step 3!)
Step 3:
We'll multiply step 1 and step 2 to get the result.
From step 1:
From step 2:
Notice that we can cancel factors to get this:
In Andrew's notation, he's referring to as .
So in the videos:
Vectorization
Vectorizing Logistic Regression
Vectorizing Logistic Regression's Gradient Output
Broadcasting in Python
Note on Python/Numpy Vectors
Logistic regression cost function
What is a Neural Network?
Neural Network Representation
Computing a Neural Network's Output
Vectorizing across multiple examples
Explanation for Vectorized Implementation
Activation Functions
The tanh function is almost always strictly superior. The one exception is for the output layer because if y is either 0 or 1, then it makes sense for y hat to be a number, the one to output that's between 0 and 1 rather than between minus 1 and 1. So the one exception where I would use the sigmoid activation function is when you are using binary classification, in which case you might use the sigmoid activation function for the output layer.
Why do you need non-linear activation functions?
There is just one place where you might use a linear activation function. g(x) = z. And that's if you are doing machine learning on the regression problem. So if y is a real number. So for example, if you are doing machine learning on the regression problem. So if y is a real number. So for example, if you're trying to predict housing prices. So y is not 0, 1, but is a real number, anywhere from - I don't know - $0 is the price of house up to however expensive.
Derivatives of activation functions
Gradient Descent for Neural Network
Back-propagation intuition
Random Initialization
Last updated