# Extensions To Linear Models

### **Interactions**

In statistics, an interaction is a particular property of three or more variables, where two or more variables *interact in a non-additive manner* when affecting a third variable. In other words, the two variables interact to have an effect that is more (or less) than the sum of their parts.

### **Polynomial regression**

```python
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(6)
X_fin = poly.fit_transform(X)
```

### **Bias/Variance Trade-Off**

#### Underfitting and Overfitting

Let's formalize this:

> *Under-fitting happens when a model cannot model the training data, nor can it generalize to new data.* happens when a model cannot model the training data, nor can it generalize to new data.

Our simple linear regression model fitter earlier was an under-fitted model.

> *Overfitting* happens when a model models the training data too well. In fact, so well that it is not generalizabl&#x65;**.**

![Drawing](https://fra-02-48764.ide-proxy.ide.learn.co/notebooks/dsc-2-24-07-bias-variance-trade-off-online-ds-ft-100118/images/bias_variance.png)

### **Ridge and Lasso Regression**

Lasso and Ridge are two commonly used so-called **regularization techniques**. Regularization is a general term used when one tries to battle overfitting.

$$\text{cost\_function\_ridge}= \sum\_{i=1}^n(y\_i - \hat{y})^2 = \sum\_{i=1}^n(y\_i - \sum\_{j=1}^k(m\_jx\_{ij} + b))^2 + \lambda \sum\_{j=1}^p m\_j^2$$

Ridge regression is often also referred to as **L2 Norm Regularization**

$$\text{cost\_function\_lasso}= \sum\_{i=1}^n(y\_i - \hat{y})^2 = \sum\_{i=1}^n(y\_i - \sum\_{j=1}^k(m\_jx\_{ij} + b))^2 + \lambda \sum\_{j=1}^p \mid m\_j \mid$$

Lasso regression is often also referred to as **L1 Norm Regularization**

```
from sklearn.linear_model import Lasso, Ridge, LinearRegression

...

# Test Train Split
X_train , X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=12)

# Ridge, Lasso regression model. 
# Note how in scikit learn, the regularization parameter is 
# denoted by alpha (and not lambda)
ridge = Ridge(alpha=0.5)
ridge.fit(X_train, y_train)

lasso = Lasso(alpha=0.5)
lasso.fit(X_train, y_train)
```

{% embed url="<https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/>" %}

### **AIC and BIC**

#### AIC **(**"Akaike's Information Criterion")

> **AIC(model) = -2 \* log-likelihood(model) + 2 \* (length of the parameter space)**

#### BIC (Bayesian Information Criterion)

> **BIC(model) = -2 \* log-likelihood(model) + log(number of observations) \* (length of the parameter space)**

#### Uses of the AIC and BIC

* Performing feature selection: comparing models with only a few variables and more variables, computing the AIC/BIC and select the features that generated the lowest AIC or BIC
* Similarly, selecting or not selecting interactions/polynomial features depending on whether or not the AIC/BIC decreases when adding them in
* Computing the AIC and BIC for several values of the regularization parameter in Ridge/Lasso models and selecting the best regularization parameter.
* Many more!
