Gradient Descent

Regression boils down to four operations:

Calculate the hypothesis h = X * theta
Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
Calculate the gradient = X' * loss / m
Update the parameters theta = theta - alpha * gradient

import numpy as np

# m denotes the number of examples here, not the number of 
# features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        
        # update
        theta = theta - alpha * gradient
        
    return theta

Linear Regression Setup

$\hat{y}_i = X_{i1} \bullet w_1 + X_{i2} \bullet w_2 + X_{i3} \bullet w_3 + ... + X_{in} \bullet w_n$

Sigmoid Function

$\large S(x) = \frac{1}{1+e^{(-x)}} = \frac{e^x}{e^x + 1}$

import numpy as np 

def sigmoid(x):
    x = np.array(x)
    return 1 / (1 + np.e ** -x)

PreviousMLE (Maximum Likelihood Estimation)NextDecision Trees

Last updated 6 years ago