# Gradient Descent

Regression boils down to four operations:

1. Calculate the hypothesis h = X \* theta
2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
3. Calculate the gradient = X' \* loss / m
4. Update the parameters theta = theta - alpha \* gradient

```python
import numpy as np

# m denotes the number of examples here, not the number of 
# features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        
        # update
        theta = theta - alpha * gradient
        
    return theta
```

### Linear Regression Setup

$$\hat{y}*i = X*{i1} \bullet w\_1 + X\_{i2} \bullet w\_2 + X\_{i3} \bullet w\_3 + ... + X\_{in} \bullet w\_n$$

### [Sigmoid Function](https://en.wikipedia.org/wiki/Sigmoid_function)&#x20;

$$\large S(x) = \frac{1}{1+e^{(-x)}} = \frac{e^x}{e^x + 1}$$

```python
import numpy as np 

def sigmoid(x):
    x = np.array(x)
    return 1 / (1 + np.e ** -x)
```
