Exploring bivariate numerical data
Last updated
Last updated
The correlation coefficient measures the direction and strength of a linear relationship. Calculating is pretty complex, so we usually rely on technology for the computations. We focus on understanding what says about a scatterplot.Here are some facts about :
It always has a value between and .
Strong positive linear relationships have values of closer to .
Strong negative linear relationships have values of closer to
Weaker relationships have values of closer to .
where stands for the z-score for each x values and stands for the z-score for each y values.
Example:
calculate as the difference from the points -value and the lines -value at a given -value.
For example, the residual for the point is :
residual = actual - predicted
Example:
A limnologist takes samples from a creek on several days and counts the numbers of flatworms in each sample. The limnologist wants to look at the relationship between the temperature of the creek and the number of flatworms in the sample. The data show a linear pattern with the summary statistics shown below:
mean
standard deviation
Find the equation of the least-squares regression line for predicting the number of flatworms from the creek temperature.
Joe sells used cars. He recorded the age (in years) of each car on his lot along with the number of kilometers it had been driven. After plotting his results, Joe noticed that the relationship between the two variables was fairly linear, so he used the data to calculate the following least squares regression equation for predicting distance driven from the age of the car:
What is a residual?
Residuals are errors. More specifically, they are the differences between the observed value of the response variable and the value predicted by the least squares regression line.
or
\fbox{ \text{residual}=y-\hat y}
Calculating the predicted value
We can predict the distance driven for a 2 year old car using the least squares regression line like this:
Calculating the residual
The closer a data point's residual is to 0, the better the fit. In this case, the line fits the point better than it fits the point .
(given equation)
residual =
creek temperature
number of flatworms
The equation for the least-squares regression line for predicting from is of the form: \newline\huge\bf\newline\hat\red{y}\red=\red a \red + \red b\red x
We can determine the slope as follows:
In our case,
Because the regression line passes through the point , we can find the y-intercept as follows:
In our case,
What is the residual of a car that is 2 years old and has been driven ?