# Logistic Regression

### Terminology Review (KNN)

Let's take a moment and review some classification evaluation metrics:

$$Precision = \frac{\text{Number of True Positives}}{\text{Number of Predicted Positives}}$$

$$Recall = \frac{\text{Number of True Positives}}{\text{Number of Actual Total Positives}}$$

$$Accuracy = \frac{\text{Number of True Positives + True Negatives}}{\text{Total Observations}}$$

### Confusion Matrices

![](/files/-LVD6g2od7X9zSKr68Sd)

### ROC Curve

The Receiver Operater Characteristic curve (ROC curve) which illustrates the false positive against false negative rate of our classifier. When training a classifier, we are hoping the ROC curve will hug the upper left corner of our graph. A classifier with 50-50 accuracy is deemed 'worthless'; this is no better then random guessing, as in the case of a coin flip.

![](/files/-LVD7tIzFi_pamYvPGpl)

### AUC

AUC (Area Under \[the] Curve) is an alternative comprehensive metric to confusion matrices, which we previously examined, and ROC graphs allow us to determine optimal precision-recall tradeoff balances specific to the specific problem we are looking to solve.

```python
from sklearn.metrics import roc_curve, auc

#scikit learns built in roc_curve method returns the fpr, tpr and thresholds
#for various decision boundaries given the case member probabilites

#First calculate the probability scores of each of the datapoints:
y_score = logreg.fit(X_train, y_train).decision_function(X_test)

fpr, tpr, thresholds = roc_curve(y_test, y_score)

# From there we can easily calclate the AUC
print('AUC: {}'.format(auc(fpr, tpr)))
```

### &#x20;Class Imbalance

#### Class Weight

```
class_weight : dict or 'balanced', default: None
    Weights associated with classes in the form 
    ``{class_label: weight}``.
    If not given, all classes are supposed to have weight one.

    The "balanced" mode uses the values of y to automatically 
    adjust weights inversely proportional to class frequencies 
    in the input data as 
    ``n_samples / (n_classes * np.bincount(y))``.

    Note that these weights will be multiplied with 
    sample_weight (passed through the fit method) if 
    sample_weight is specified.

    .. versionadded:: 0.17
       *class_weight='balanced'*
```

Example:

```python
weights = [None, 'balanced', {1:2, 0:1}, {1:10, 0:1}, 
           {1:100, 0:1}, {1:1000, 0:1}]
for n, weight in enumerate(weights):
    logreg = LogisticRegression(fit_intercept = False, 
                                C = 1e12, 
                                class_weight=weight)
    ...
```

#### Oversampling/Undersampling

SMOTE (Synthetic Minority Oversampling):

```python
from imblearn.over_sampling import SMOTE, ADASYN

print(y.value_counts()) #Previous original class distribution
X_resampled, y_resampled = SMOTE().fit_sample(X, y) 
print(pd.Series(y_resampled).value_counts()) #Preview synthetic sample class distribution

0    99773
1      227
Name: is_attributed, dtype: int64

1    99773
0    99773
dtype: int64
```

ROC curve is misleading because the test set was also manipulated using SMOTE. This produces results that will not be comparable to future cases as we have synthetically created test cases. SMOTE should only be applied to training sets, and then from there an accuracte gauge can be made on the model's performance by using a raw test sample that has not been oversampled or undersampled.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://stephanosterburg.gitbook.io/scrapbook/career/learn.co/logistic-regression.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
