# Distance Metrics

### Best Practices

* What decisions do I need to make regarding my data? How might these decisions affect overall performance?
* Which predictors do I need? How can I confirm that I have the right predictors?
* What parameter values (if any) should I choose for my model? How can I find the optimal value for a given parameter?
* What metrics will I use to evaluate the performance of my model? Why?
* How do I know if there's room left for improvement with my model? Are the potential performance gains worth the time needed to reach them?

### Workflow

* First
  * import standard libraries
  * import and read dataset
* Preprocessing Data
  * Remove unnecessary columns
  * Convert feature(s) to binary encoding
  * Detect and deal with null values
  * One-Hot Encode categorical columns
  * Store target column in a separate valiable and remove it from DataFrame
* Normalize Data
  * StandardScaler
  * .fit\_transform()
  * Creating Training and Testing Sets (train\_test\_split)
* Creating and Fitting KNN Model
  * KNeighborsClassifier
  * Fit the classifier to training data/labels (labels = target)
  * Use the classifier to generate predictions
* Precision, Recall, Accuracy and F1-Score
  * from sklearn.metrics import precision\_score, recall\_score, accuracy\_score, f1\_score
* Improving Model Performance
  * take in six parameters:

    * `X_train`, `y_train`, `X_test`, and `y_test`
    * `min_k` and `max_k`. Set these to `1` and `25`, by default

    Create two variables, `best_k` and `best_score`

    Iterate through every ***odd number*** between `min_k` and `max_k + 1`.

    For each iteration:

    * Create a new KNN classifier, and set the `n_neighbors` parameter to the current value for k, as determined by our loop.
    * Fit this classifier to the training data.
    * Generate predictions for `X_test` using the fitted classifier.
    * Calculate the ***F1-score*** for these predictions.
    * Compare this F1-score to `best_score`. If better, update `best_score` and `best_k`.

    Once it has checked every value for `k`, print out the best value for k and the F1-score it achieved.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://stephanosterburg.gitbook.io/scrapbook/career/learn.co/distance-metrics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
