# Coefficient of determination

Coefficient of determination |
---|

See also |

The **coefficient of determination**, sometimes referred to as R-squared, is a measure of how well a regression line fits a set of data. It is calculated by taking the square of the correlation coefficient between two variables. This value ranges from 0 to 1, with a higher coefficient of determination indicating a better fit.

A coefficient of determination of 0 would indicate that the two variables have no correlation, while a coefficient of 1 would indicate that the two variables are perfectly correlated. In between these two extremes, the closer the number is to 1, the better the fit of the regression line to the data.

## Example of Coefficient of determination

Let’s say we have the following data set:

- x = [1, 2, 3, 4, 5]
- y = [2, 4, 6, 8, 10]

By calculating the correlation coefficient $r$, we can determine the coefficient of determination $R^2$. The Pearson correlation coefficient $r$ can be calculated using the formula\[r = \frac{\sum{xy} - \frac{1}{n}\sum{x}\sum{y}}{\sqrt{\sum{x^2} - \frac{1}{n}(\sum{x})^2}\sqrt{\sum{y^2} - \frac{1}{n}(\sum{y})^2}}\]

In this case,

\(\sum{x} = 15\), \($\sum{y} = 30\), \(\sum{xy} = 70\), \(\sum{x^2} = 55\), </math>\sum{y^2} = 200</math>

Plugging these values into the formula gives us\[$r = \frac{70 - (15 * 30) / 5}{\sqrt{55 - (15^2) / 5}\sqrt{200 - (30^2) / 5}} = \frac{70 - 150 / 5}{\sqrt{55 - 225 / 5}\sqrt{200 - 900 / 5}} = \frac{70 - 30}{\sqrt{30}\sqrt{30}} = \frac{40}{30} = \frac{4}{3}\]

The coefficient of determination $R^2$ is then calculated by taking the square of this value\[R^2 = (\frac{4}{3})^2 = \frac{16}{9}\]

This means that the regression line has a coefficient of determination of 0.89, which is close to 1, indicating that the data set has a strong correlation.

In conclusion, the coefficient of determination is a measure of how well a regression line fits a set of data. It is calculated by taking the square of the correlation coefficient between two variables and ranges from 0 to 1, with a higher coefficient of determination indicating a better fit. The formula for calculating the coefficient of determination is given by $R^2 = r^2$, where $r$ is the correlation coefficient. An example of how to calculate the coefficient of determination is given above.

## Formula of Coefficient of determination

The formula for calculating the coefficient of determination is given by\[R^2 = r^2\]

Where $r$ is the correlation coefficient. The correlation coefficient is a measure of the strength of the linear relationship between two variables and can range from -1 to +1. A correlation coefficient of 0 indicates no linear relationship between the two variables, while a correlation coefficient of 1 or -1 indicates a perfect linear relationship.

The coefficient of determination is then calculated by squaring the correlation coefficient. This value also ranges from 0 to 1, with 0 indicating no correlation and 1 indicating a perfect fit of the regression line to the data.

## When to use Coefficient of determination

The coefficient of determination is especially useful when analyzing the relationships between variables. It is used to measure how well a regression line fits a set of data, allowing us to see how closely related two variables are. It can also be used to measure the accuracy of a predictive model. The higher the coefficient of determination, the better the model is at predicting outcomes.

## Types of Coefficient of determination

The coefficient of determination can be broken down into two types: adjusted and unadjusted. Unadjusted R-squared is calculated by taking the square of the correlation coefficient between two variables, while adjusted R-squared takes into account the number of parameters in the model.

Unadjusted R-squared\[R^2 = r^2\]

Adjusted R-squared\[R^2_{adj} = 1 - (1-r^2)\frac{n-1}{n-p-1}\]

Where n is the sample size and p is the number of parameters in the model.

The coefficient of determination is a useful tool for evaluating the goodness of fit of a regression line to a set of data. Unadjusted R-squared is calculated by taking the square of the correlation coefficient between two variables, while adjusted R-squared takes into account the number of parameters in the model. A higher coefficient of determination indicates a better fit of the regression line to the data, with a value of 1 indicating a perfect fit.

## Advantages of Coefficient of determination

- The coefficient of determination is an easy to understand measure of how well two variables are correlated.
- It provides an easy way to compare the correlation of two different sets of data.
- It can be used to measure the success of a regression line in fitting a set of data points.

## Disadvantages of Coefficient of determination

- It is only applicable to linear relationships, and so cannot be used to measure the strength of non-linear relationships.
- It does not take into account any outliers, so the coefficient of determination may not accurately reflect the true correlation of two variables.
- It does not provide any information on the direction of the relationship between two variables.

## Limitations of Coefficient of determination

Despite the usefulness of the coefficient of determination in measuring the strength of the linear relationship between two variables, there are some limitations to be aware of.

- The coefficient of determination only measures the strength of a linear relationship, and so is not suitable for measuring non-linear relationships.
- The coefficient of determination cannot account for any outliers in the data.
- The coefficient of determination does not measure the accuracy of the regression line, only the strength of the relationship between the two variables.

The coefficient of determination can be used with other measures of fit such as the root mean squared error (RMSE) or the mean absolute error (MAE). The coefficient of determination can also be used to compare the fit of different regression models to a given data set.

The coefficient of determination can also be used to compare different regression techniques, such as linear regression and polynomial regression. By comparing the coefficient of determination of two different regression techniques, one can determine which has the better fit to a given data set.

In summary, the coefficient of determination is a measure of how well a regression line fits a set of data. It is calculated by taking the square of the correlation coefficient between two variables, and ranges from 0 to 1 with a higher coefficient of determination indicating a better fit. It can be used with other measures of fit such as the root mean squared error and mean absolute error, as well as to compare the fit of different regression models and techniques.

## Suggested literature

- Nagelkerke, N. J. (1991).
*A note on a general definition of the coefficient of determination*. Biometrika, 78(3), 691-692. - Ozer, D. J. (1985).
*Correlation and the coefficient of determination*. Psychological bulletin, 97(2), 307. - Barrett, J. P. (1974).
*The coefficient of determination—some limitations*. The American Statistician, 28(1), 19-20.