Logistic Regression
Terminology Review (KNN)
Let's take a moment and review some classification evaluation metrics:
Confusion Matrices

ROC Curve
The Receiver Operater Characteristic curve (ROC curve) which illustrates the false positive against false negative rate of our classifier. When training a classifier, we are hoping the ROC curve will hug the upper left corner of our graph. A classifier with 50-50 accuracy is deemed 'worthless'; this is no better then random guessing, as in the case of a coin flip.

AUC
AUC (Area Under [the] Curve) is an alternative comprehensive metric to confusion matrices, which we previously examined, and ROC graphs allow us to determine optimal precision-recall tradeoff balances specific to the specific problem we are looking to solve.
Class Imbalance
Class Weight
Example:
Oversampling/Undersampling
SMOTE (Synthetic Minority Oversampling):
ROC curve is misleading because the test set was also manipulated using SMOTE. This produces results that will not be comparable to future cases as we have synthetically created test cases. SMOTE should only be applied to training sets, and then from there an accuracte gauge can be made on the model's performance by using a raw test sample that has not been oversampled or undersampled.
Last updated