# Modeling data distribution

### **z-score** - how many $$\sigma$$away from the mean $$\mu$$

example value 65

$$\frac{value - \mu}{\sigma} \Longrightarrow \frac{65-81}{6.3}=-2.54$$

A z-score measures exactly how many standard deviations above or below the mean a data point is. Here's the formula for calculating a z-score:

$$z = \frac{data point−mean}{standard deviation} \Longrightarrow z = \frac{x - \mu}{\sigma}$$<br>

Here are some important facts about z-scores:

* A positive z-score says the data point is above average.
* A negative z-score says the data point is below average.
* A z-score close to 000 says the data point is close to average.
* A data point can be considered unusual if its z-score is above 333 or below -3−3minus, 3.

### **Standard Deviation and IQR change only with** multiplication and devision, but not with addition and subtraction. The mean and Median do change either way.

![](/files/-LM2_-iWfSW3WXpUbbu7)

### Normal distribution: Empirical Rule (68-95-99.7%)

![](/files/-LM2isb9wnYuEdJAlafB)

### Standard normal distribution:&#x20;

$$\mu = 0 (\text{mean})\newline \sigma = 1 (\text{standard deviation})$$

### What is a normal distribution?

Early statisticians noticed the same shape coming up over and over again in different distributions—so they named it the normal distribution.

![](/files/-LM2mNfEEnA6QPR-snOO)

Normal distributions have the following features:

* symmetric bell shape
* mean and median are equal; both located at the center of the distribution
* ≈ 68% of the data falls within 1 standard deviation of the mean
* ≈ 95% of the data falls within 2 standard deviations of the mean
* ≈ 99.7% of the data falls within 3 standard deviations of the mean

### Quiz

A set of average city temperatures in August are normally distributed with a mean of $$21.25 ^\circ$$C and a standard deviation of $$2 ^\circ$$C.

$$\large \frac{value - \mu}{\sigma} = \text{z-score}$$

**What proportion of temperatures are between** $$19.63^\circ$$ **C and** $$20.53^\circ$$ **C?**\
\&#xNAN;*You may round your answer to four decimal places.*

1. Let's find the z-score for$$19.63^\circ$$C and $$20.53^\circ$$C:&#x20;

   $$z\_1 = \frac{19.63 - 21.25}{2} = \frac{-1.62}{2} = -0.81$$

   $$z\_2 = \frac{20.53 - 21.25}{2} = \frac{-0.72}{2} = -0.36$$
2. We want to find the proportion of temperatures between these two z-scores:

   ![](https://cdn.kastatic.org/ka-perseus-graphie/15590e61d44ea2eddd5f740df3b15ebc90f64407.svg)$$z\_1z\_2$$
3. Looking up $$z\_1 = -0.81$$ on the z-table, we see that $$0.2090$$ of temperatures are **below** $$\blueD{19.63}^\circ$$C:

   ![](https://cdn.kastatic.org/ka-perseus-graphie/e60def34b3e0faadfda0eaf530d40d2015c2044e.svg)$$z\_1$$
4. Looking up $$z\_2 = -0.36$$ on the z-table, we see that $$0.3594$$ of temperatures are **below** $$\goldD{20.53}^\circ$$C:

   ![](https://cdn.kastatic.org/ka-perseus-graphie/c06e2cbc867d90958150f433a3b07a455d6dc4f7.svg)$$z\_2$$
5. To find the area between $$z\_1$$ and $$z\_2$$we can subtract the area below $$z\_1$$ from the area below $$z\_2$$

   $$0.3594 - 0.2090 = \greenD{0.1504}$$

   ![](https://cdn.kastatic.org/ka-perseus-graphie/15590e61d44ea2eddd5f740df3b15ebc90f64407.svg)$$z\_1​z\_2$$
6. The answer: $$\greenD{0.1504}$$


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://stephanosterburg.gitbook.io/scrapbook/math/statistics-and-probability/modeling-data-distribution.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
