# Distance Metrics

### Best Practices

* What decisions do I need to make regarding my data? How might these decisions affect overall performance?
* Which predictors do I need? How can I confirm that I have the right predictors?
* What parameter values (if any) should I choose for my model? How can I find the optimal value for a given parameter?
* What metrics will I use to evaluate the performance of my model? Why?
* How do I know if there's room left for improvement with my model? Are the potential performance gains worth the time needed to reach them?

### Workflow

* First
  * import standard libraries
  * import and read dataset
* Preprocessing Data
  * Remove unnecessary columns
  * Convert feature(s) to binary encoding
  * Detect and deal with null values
  * One-Hot Encode categorical columns
  * Store target column in a separate valiable and remove it from DataFrame
* Normalize Data
  * StandardScaler
  * .fit\_transform()
  * Creating Training and Testing Sets (train\_test\_split)
* Creating and Fitting KNN Model
  * KNeighborsClassifier
  * Fit the classifier to training data/labels (labels = target)
  * Use the classifier to generate predictions
* Precision, Recall, Accuracy and F1-Score
  * from sklearn.metrics import precision\_score, recall\_score, accuracy\_score, f1\_score
* Improving Model Performance
  * take in six parameters:

    * `X_train`, `y_train`, `X_test`, and `y_test`
    * `min_k` and `max_k`. Set these to `1` and `25`, by default

    Create two variables, `best_k` and `best_score`

    Iterate through every ***odd number*** between `min_k` and `max_k + 1`.

    For each iteration:

    * Create a new KNN classifier, and set the `n_neighbors` parameter to the current value for k, as determined by our loop.
    * Fit this classifier to the training data.
    * Generate predictions for `X_test` using the fitted classifier.
    * Calculate the ***F1-score*** for these predictions.
    * Compare this F1-score to `best_score`. If better, update `best_score` and `best_k`.

    Once it has checked every value for `k`, print out the best value for k and the F1-score it achieved.
