Central Limit Theorem

The central limit theorem (CLT) states that, for a large enough sample (nn), the distribution of the sample mean will approach normal distribution. This holds for a sample of independent random variables from any distribution with a finite standard deviation.

Let {X1,X2,X3,...,Xn}\{X_1, X_2, X_3,...,X_n\} be a random data set of size nn, that is, a sequence of independent and identically distributed random variables drawn from distributions of expected values given by μ\mu and finite variances given by σ2\sigma^2. The sample average is:

sn:=iXiNs_n:=\frac{\sum_i X_i}{N}

For large nn, the distribution of sample sums SnS_n is close to normal distribution N(μ,σ)N(\mu^\prime,\sigma^\prime) where:

  • μ=n×μ\mu^\prime=n \times \mu

  • σ=n×σ\sigma^\prime=\sqrt{n} \times \sigma

Task A large elevator can transport a maximum of 98009800 pounds. Suppose a load of cargo containing 4949 boxes must be transported via the elevator. The box weight of this type of cargo follows a distribution with a mean of μ=205\mu=205 pounds and a standard deviation of σ=15\sigma=15 pounds. Based on this information, what is the probability that all 4949 boxes can be safely loaded into the freight elevator and transported?

import math

def less_than_boundary_cdf(x, mean, std):
    return round(0.5 * (1 + math.erf((x - mean)/ (std * math.sqrt(2)))), 4)

m = int(input())
n = int(input())
mean = int(input())
devi = int(input())

print(less_than_boundary_cdf(m, n * mean, math.sqrt(n) * devi))

Task The number of tickets purchased by each student for the University X vs. University Y football game follows a distribution that has a mean of μ=2.4\mu = 2.4 and a standard deviation of σ=2.0\sigma = 2.0.

A few hours before the game starts, 100100 eager students line up to purchase last-minute tickets. If there are only 250250 tickets left, what is the probability that all 100100 students will be able to purchase tickets?

import math
def less_than_boundary_cdf(x, mean, std):    
    return round(0.5 * (1 + math.erf((x - mean)/ (std * math.sqrt(2)))), 4)
    
m = int(input())
n = int(input())
mean = float(input())
devi = float(input())

print(less_than_boundary_cdf(m, n * mean, math.sqrt(n) * devi))

Task You have a sample of 100100 values from a population with mean μ=500\mu=500 and with standard deviation σ=80\sigma=80. Compute the interval that covers the middle 95%95\% of the distribution of the sample mean; in other words, compute AA and BB such that P(A<x<B)P(A<x<B). Use the value of z=1.96z=1.96. Note that zz is the z-score.

import math

zScore = 1.96
std = 80
n = 100
mean = 500

marginOfError = zScore * (std / math.sqrt(n));
print(mean - marginOfError)
print(mean + marginOfError)

The marginOfError formula can be found here.

E=zα/2σn\huge E =z_{\alpha/2}\frac{\sigma}{\sqrt{n}}

Last updated