As in the Least Squares visualization, six data points are arranged in the coordinate plane and may be moved about at will. The least squares regression line for the six points is drawn.
Two buttons control hiding and showing of:
- the squares of the deviations of the points from the mean of their y values (the red squares), and
- the squares of the errors or residuals from the points to the regression line (the blue squares).
The sum of the areas of the red squares is shown as a large red square - the total squared deviation that we are trying to explain with a linear model. Likewise, the sum of the the areas of the blue squares is shown as a large blue square - the total squared error that remains unexplained by the linear model.
The ratio of these two areas (blue / red) is proportion of the deviation that remains unexplained. One minus this ratio is proportion of the deviation that is explained.
- Why is it that the big blue square will always have less area than the big red square?
- Drag point P1 along the the regression line until it becomes an outlier. Notice that as you get farther and farther from the other five points, the r2 value becomes closer and closer to one. Explain why this is happening and comment on how it can happen that one point can provide a high correlation.
- Drag the six points until they all have approximately the same y-values. Notice that r squared becomes nearly zero. Explain why this has to be so.
- Find two other arrangements of the six points that produce a zero r2 value and explain why they do so in terms of the red and blue squares.
- Arrange the six points so that the r2 value is one. Explain why your arrangement works.
- Is there any arrangement of the six points that has an r2 value of one but in which the points are not all on a line? Why, or why not?