Extensions To Linear Models
Last updated
Last updated
In statistics, an interaction is a particular property of three or more variables, where two or more variables interact in a non-additive manner when affecting a third variable. In other words, the two variables interact to have an effect that is more (or less) than the sum of their parts.
Let's formalize this:
Under-fitting happens when a model cannot model the training data, nor can it generalize to new data. happens when a model cannot model the training data, nor can it generalize to new data.
Our simple linear regression model fitter earlier was an under-fitted model.
Overfitting happens when a model models the training data too well. In fact, so well that it is not generalizable.
Lasso and Ridge are two commonly used so-called regularization techniques. Regularization is a general term used when one tries to battle overfitting.
\text{cost_function_ridge}= \sum_{i=1}^n(y_i - \hat{y})^2 = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij} + b))^2 + \lambda \sum_{j=1}^p m_j^2
Ridge regression is often also referred to as L2 Norm Regularization
\text{cost_function_lasso}= \sum_{i=1}^n(y_i - \hat{y})^2 = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij} + b))^2 + \lambda \sum_{j=1}^p \mid m_j \mid
Lasso regression is often also referred to as L1 Norm Regularization
AIC(model) = -2 * log-likelihood(model) + 2 * (length of the parameter space)
BIC(model) = -2 * log-likelihood(model) + log(number of observations) * (length of the parameter space)
Performing feature selection: comparing models with only a few variables and more variables, computing the AIC/BIC and select the features that generated the lowest AIC or BIC
Similarly, selecting or not selecting interactions/polynomial features depending on whether or not the AIC/BIC decreases when adding them in
Computing the AIC and BIC for several values of the regularization parameter in Ridge/Lasso models and selecting the best regularization parameter.
Many more!