If the coefficient of correlation is 0.60, then the coefficient of determination is

What Is the Coefficient of Determination?

The coefficient of determination is a statistical measurement that examines how differences in one variable can be explained by the difference in a second variable, when predicting the outcome of a given event. In other words, this coefficient, which is more commonly known as R-squared (or R2), assesses how strong the linear relationship is between two variables, and is heavily relied on by researchers when conducting trend analysis. To cite an example of its application, this coefficient may contemplate the following question: if a woman becomes pregnant on a certain day, what is the likelihood that she would deliver her baby on a particular date in the future? In this scenario, this metric aims to calculate the correlation between two related events: conception and birth.

R-Squared

Key Takeaways

The coefficient of determination is a complex idea centered on the statistical analysis of models for data.
The coefficient of determination is used to explain how much variability of one factor can be caused by its relationship to another factor.
This coefficient is commonly known as R-squared (or R2), and is sometimes referred to as the "goodness of fit."
This measure is represented as a value between 0.0 and 1.0, where a value of 1.0 indicates a perfect fit, and is thus a highly reliable model for future forecasts, while a value of 0.0 would indicate that the model fails to accurately model the data at all.

Understanding the Coefficient of Determination

The coefficient of determination is a measurement used to explain how much variability of one factor can be caused by its relationship to another related factor. This correlation, known as the "goodness of fit," is represented as a value between 0.0 and 1.0. A value of 1.0 indicates a perfect fit, and is thus a highly reliable model for future forecasts, while a value of 0.0 would indicate that the calculation fails to accurately model the data at all. But a value of 0.20, for example, suggests that 20% of the dependent variable is predicted by the independent variable, while a value of 0.50 suggests that 50% of the dependent variable is predicted by the independent variable, and so forth.

Graphing the Coefficient of Determination

On a graph, the goodness of fit measures the distance between a fitted line and all of the data points that are scattered throughout the diagram. The tight set of data will have a regression line that's close to the points and have a high level of fit, meaning that the distance between the line and the data is small. Although a good fit has an R2 close to 1.0, this number alone cannot determine whether the data points or predictions are biased. It also doesn't tell analysts whether the coefficient of determination value is intrinsically good or bad. It is at the discretion of the user to evaluate the meaning of this correlation, and how it may be applied in the context of future trend analyses.

Statistical Techniques in Business and Economics

15th EditionDouglas A. Lind, Samuel A. Wathen, William G. Marchal

1,236 solutions

Fundamentals of Engineering Economic Analysis

1st EditionDavid Besanko, Mark Shanley, Scott Schaefer

215 solutions

Introductory Business Statistics

1st EditionAlexander Holmes, Barbara Illowsky, Susan Dean

2,174 solutions

Fundamentals of Financial Management

14th EditionEugene F. Brigham, Joel F Houston

845 solutions

Last updated
Save as PDF

Page ID553

Learning Objectives

To learn what the coefficient of determination is, how to compute it, and what it tells us about the relationship between two variables $x$ and $y$.

If the scatter diagram of a set of $(x,y)$ pairs shows neither an upward or downward trend, then the horizontal line $\hat{y} =\overline{y}$ fits it well, as illustrated in Figure $\PageIndex{1}$. The lack of any upward or downward trend means that when an element of the population is selected at random, knowing the value of the measurement $x$ for that element is not helpful in predicting the value of the measurement $y$.

Figure $\PageIndex{1}$:The line $\hat{y} =\overline{y}$ fits the scatter diagram well.

If the scatter diagram shows a linear trend upward or downward then it is useful to compute the least squares regression line

\[\hat{y} =\hat{β}_1x+\hat{β}_0\]

and use it in predicting $y$. Figure $\PageIndex{2}$ illustrates this. In each panel we have plotted the height and weight data of Section 10.1. This is the same scatter plot as Figure $\PageIndex{2}$, with the average value line $\hat{y} =\overline{y}$ superimposed on it in the left panel and the least squares regression line imposed on it in the right panel. The errors are indicated graphically by the vertical line segments.

Figure $\PageIndex{2}$: Same Scatter Diagram with Two Approximating Lines

The sum of the squared errors computed for the regression line, $SSE$, is smaller than the sum of the squared errors computed for any other line. In particular it is less than the sum of the squared errors computed using the line $\hat{y}=\overline{y}$, which sum is actually the number $SS_{yy}$ that we have seen several times already. A measure of how useful it is to use the regression equation for prediction of $y$ is how much smaller $SSE$ is than $SS_{yy}$. In particular, the proportion of the sum of the squared errors for the line $\hat{y} =\overline{y}$ that is eliminated by going over to the least squares regression line is

\[\dfrac{SS_{yy}−SSE}{SS_{yy}}=\dfrac{SS_{yy}}{SS_{yy}}−\dfrac{SSE}{SS_{yy}}=1−\dfrac{SSE}{SS_{yy}}\]

We can think of $SSE/SS_{yy}$ as the proportion of the variability in $y$ that cannot be accounted for by the linear relationship between $x$ and $y$, since it is still there even when $x$ is taken into account in the best way possible (using the least squares regression line; remember that $SSE$ is the smallest the sum of the squared errors can be for any line). Seen in this light, the coefficient of determination, the complementary proportion of the variability in $y$, is the proportion of the variability in all the $y$ measurements that is accounted for by the linear relationship between $x$ and $y$.

In the context of linear regression the coefficient of determination is always the square of the correlation coefficient $r$ discussed in Section 10.2. Thus the coefficient of determination is denoted $r^2$, and we have two additional formulas for computing it.

Definition: coefficient of determination

The coefficient of determination of a collection of $(x,y)$ pairs is the number $r^2$ computed by any of the following three expressions:

\[r^2=\dfrac{SS_{yy}−SSE}{SS_{yy}}=\dfrac{SS^2_{xy}}{SS_{xx}SS_{yy}}=\hat{β}_1 \dfrac{SS_{xy}}{SS_{yy}}\]

It measures the proportion of the variability in $y$ that is accounted for by the linear relationship between $x$ and $y$.

If the correlation coefficient $r$ is already known then the coefficient of determination can be computed simply by squaring $r$, as the notation indicates, $r^2=(r)^2$.

Example $\PageIndex{1}$

The value of used vehicles of the make and model discussed in "Example 10.4.2" in Section 10.4 varies widely. The most expensive automobile in the sample in Table 10.4.3 has value $\$30,500$, which is nearly half again as much as the least expensive one, which is worth $\$20,400$. Find the proportion of the variability in value that is accounted for by the linear relationship between age and value.

Solution:

The proportion of the variability in value $y$ that is accounted for by the linear relationship between it and age $x$ is given by the coefficient of determination, $r^2$. Since the correlation coefficient $r$ was already computed in "Example 10.4.2" in Section 10.4 as

\[r=-0.819\\ r^2=(-0.819)2=0.671\]

About $67\%$ of the variability in the value of this vehicle can be explained by its age.

Example $\PageIndex{2}$

Use each of the three formulas for the coefficient of determination to compute its value for the example of ages and values of vehicles.

Solution:

In "Example 10.4.2" in Section 10.4 we computed the exact values

\[SS_{xx}=14\\ SS_{xy}=-28.7\\ SS_{yy}=87.781\\ \hat{\beta _1}=-2.05\]

In "Example 10.4.4" in Section 10.4 we computed the exact value

\[SSE=28.946\]

Inserting these values into the formulas in the definition, one after the other, gives

\[r^2=\dfrac{SS_{yy}−SSE}{SS_{yy}}=\dfrac{87.781−28.946}{87.781}=0.6702475479\]

\[r^2= \dfrac{SS^2_{xy}}{SS_{xx}SS_{yy}}=\dfrac{(−28.7)^2}{(14)(87.781)}=0.6702475479\]

\[r^2=\hat{β}_1 \dfrac{SS_{xy}}{SS_{yy}}=−2.05\dfrac{−28.7}{87.781}=0.6702475479\]

which rounds to $0.670$. The discrepancy between the value here and in the previous example is because a rounded value of $r$ from "Example 10.4.2" was used there. The actual value of $r$ before rounding is $0.8186864772$, which when squared gives the value for $r^2$ obtained here.

The coefficient of determination $r^2$ can always be computed by squaring the correlation coefficient $r$ if it is known. Any one of the defining formulas can also be used. Typically one would make the choice based on which quantities have already been computed. What should be avoided is trying to compute $r$ by taking the square root of $r^2$, if it is already known, since it is easy to make a sign error this way. To see what can go wrong, suppose $r^2=0.64$. Taking the square root of a positive number with any calculating device will always return a positive result. The square root of $0.64$ is $0.8$. However, the actual value of $r$ might be the negative number $-0.8$.

Key Takeaway

The coefficient of determination $r^2$ estimates the proportion of the variability in the variable $y$ that is explained by the linear relationship between $y$ and the variable $x$.
There are several formulas for computing $r^2$. The choice of which one to use can be based on which quantities have already been computed so far.

What does a coefficient of determination of 0.60 mean?

For example, a coefficient of determination of 60% shows that 60% of the data fit the regression model. Generally, a higher coefficient indicates a better fit for the model.

What is the coefficient of correlation and the coefficient of determination?

Coefficient of Determination is the R square value i.e. . 723 (or 72.3%). R square is simply square of R i.e. R times R. Coefficient of Correlation: is the degree of relationship between two variables say x and y.

What does the coefficient of determination is 0.49 mean?

Hi in this problem we are given a coefficient of determination. KAR two for 40. So if this by itself alone it means that 49% of variation in one variable can be explained by the other.

What does a coefficient of determination of 0.95 indicates?

In a regression problem, if the coefficient of determination is 0.95, this means that: a. 95% of the y values are positive.

If the coefficient of correlation is 0.60, then the coefficient of determination is

What Is the Coefficient of Determination?

R-Squared

Key Takeaways

Understanding the Coefficient of Determination

Graphing the Coefficient of Determination

Statistical Techniques in Business and Economics

Fundamentals of Engineering Economic Analysis

Introductory Business Statistics

Fundamentals of Financial Management

What does a coefficient of determination of 0.60 mean?

What is the coefficient of correlation and the coefficient of determination?

What does the coefficient of determination is 0.49 mean?

What does a coefficient of determination of 0.95 indicates?

zusammenhängende Posts

Which of the following refers to the unequal distribution of wealth, income, and status?

Which of the following statements about the correlation coefficient are true check all that apply

Which among the following analysis deals with how one or more variables affect changes in another variable?

What is the range of possible values of a pearsons product-moment correlation coefficient?

Which of the following correlation coefficients indicates a perfect negative relationship?

If the absolute value of your correlation coefficient is 1 your error in prediction will be

Toplist

Neuester Beitrag

Stichworte