Modeling data distribution

z-score - how many σ\sigmaaway from the mean μ\mu

example value 65

valueμσ65816.3=2.54 \frac{value - \mu}{\sigma} \Longrightarrow \frac{65-81}{6.3}=-2.54

A z-score measures exactly how many standard deviations above or below the mean a data point is. Here's the formula for calculating a z-score:

z=datapointmeanstandarddeviationz=xμσ z = \frac{data point−mean}{standard deviation} \Longrightarrow z = \frac{x - \mu}{\sigma}

Here are some important facts about z-scores:

  • A positive z-score says the data point is above average.

  • A negative z-score says the data point is below average.

  • A z-score close to 000 says the data point is close to average.

  • A data point can be considered unusual if its z-score is above 333 or below -3−3minus, 3.

Standard Deviation and IQR change only with multiplication and devision, but not with addition and subtraction. The mean and Median do change either way.

Normal distribution: Empirical Rule (68-95-99.7%)

Standard normal distribution:

μ=0(mean)σ=1(standard deviation)\mu = 0 (\text{mean})\newline \sigma = 1 (\text{standard deviation})

What is a normal distribution?

Early statisticians noticed the same shape coming up over and over again in different distributions—so they named it the normal distribution.

Normal distributions have the following features:

  • symmetric bell shape

  • mean and median are equal; both located at the center of the distribution

  • ≈ 68% of the data falls within 1 standard deviation of the mean

  • ≈ 95% of the data falls within 2 standard deviations of the mean

  • ≈ 99.7% of the data falls within 3 standard deviations of the mean

Quiz

A set of average city temperatures in August are normally distributed with a mean of 21.2521.25 ^\circC and a standard deviation of 22 ^\circ C.

valueμσ=z-score \large \frac{value - \mu}{\sigma} = \text{z-score}

What proportion of temperatures are between 19.6319.63^\circ C and 20.5320.53^\circ C? You may round your answer to four decimal places.

  1. Let's find the z-score for19.6319.63^\circC and 20.5320.53^\circ C:

    z1=19.6321.252=1.622=0.81z_1 = \frac{19.63 - 21.25}{2} = \frac{-1.62}{2} = -0.81

    z2=20.5321.252=0.722=0.36z_2 = \frac{20.53 - 21.25}{2} = \frac{-0.72}{2} = -0.36

  2. We want to find the proportion of temperatures between these two z-scores:

    z1z2z_1z_2

  3. Looking up z1=0.81z_1 = -0.81 on the z-table, we see that 0.2090 0.2090 of temperatures are below 19.63\blueD{19.63}^\circC:

    z1z_1

  4. Looking up z2=0.36z_2 = -0.36 on the z-table, we see that 0.35940.3594 of temperatures are below 20.53\goldD{20.53}^\circC:

    z2z_2

  5. To find the area between z1z_1 and z2z_2we can subtract the area below z1z_1 from the area below z2z_2

    0.35940.2090=0.15040.3594 - 0.2090 = \greenD{0.1504}

    z1z2z_1​z_2

  6. The answer: 0.1504\greenD{0.1504}

Last updated