Exploring bivariate numerical data
Correlation coefficient r
The correlation coefficient measures the direction and strength of a linear relationship. Calculating is pretty complex, so we usually rely on technology for the computations. We focus on understanding what says about a scatterplot.Here are some facts about :
It always has a value between and .
Strong positive linear relationships have values of closer to .
Strong negative linear relationships have values of closer to
Weaker relationships have values of closer to .
where stands for the z-score for each x values and stands for the z-score for each y values.
Example:
Residuals and least squares regression
calculate as the difference from the points -value and the lines -value at a given -value.
For example, the residual for the point is :
The closer a data point's residual is to 0, the better the fit. In this case, the line fits the point better than it fits the point .
residual = actual - predicted
(given equation)
residual =
Example:
Quiz
A limnologist takes samples from a creek on several days and counts the numbers of flatworms in each sample. The limnologist wants to look at the relationship between the temperature of the creek and the number of flatworms in the sample. The data show a linear pattern with the summary statistics shown below:
mean | standard deviation | |
Find the equation of the least-squares regression line for predicting the number of flatworms from the creek temperature.
Least-squares regression equation
The equation for the least-squares regression line for predicting from is of the form: \newline\huge\bf\newline\hat\red{y}\red=\red a \red + \red b\red x
Finding the slope
We can determine the slope as follows:
In our case,
Finding the y-intercept
Because the regression line passes through the point , we can find the y-intercept as follows:
In our case,
Answer
Residual
Joe sells used cars. He recorded the age (in years) of each car on his lot along with the number of kilometers it had been driven. After plotting his results, Joe noticed that the relationship between the two variables was fairly linear, so he used the data to calculate the following least squares regression equation for predicting distance driven from the age of the car:
What is the residual of a car that is 2 years old and has been driven ?
What is a residual?
Residuals are errors. More specifically, they are the differences between the observed value of the response variable and the value predicted by the least squares regression line.
or
\fbox{ \text{residual}=y-\hat y}
Calculating the predicted value
We can predict the distance driven for a 2 year old car using the least squares regression line like this:
Calculating the residual
Last updated