Variance inflation factor: Difference between revisions
(The LinkTitles extension automatically added links to existing pages (<a target="_blank" rel="noreferrer noopener" class="external free" href="https://github.com/bovender/LinkTitles">https://github.com/bovender/LinkTitles</a>).) |
m (Infobox5 upgrade) |
||
Line 1: | Line 1: | ||
'''Variance [[inflation]] factor''' (VIF) is a statistical measure that assesses the degree of multicollinearity or correlation of independent variables in a multiple regression model. VIF estimates how much the variance of an estimated regression coefficient increases if the independent variables are inter-related. A VIF of 1 indicates no correlation; a VIF greater than 1 indicates a problematic level of multicollinearity, suggesting that the independent variables are not providing additional [[information]] beyond what is already provided by other independent variables in the model. | '''Variance [[inflation]] factor''' (VIF) is a statistical measure that assesses the degree of multicollinearity or correlation of independent variables in a multiple regression model. VIF estimates how much the variance of an estimated regression coefficient increases if the independent variables are inter-related. A VIF of 1 indicates no correlation; a VIF greater than 1 indicates a problematic level of multicollinearity, suggesting that the independent variables are not providing additional [[information]] beyond what is already provided by other independent variables in the model. | ||
Line 96: | Line 80: | ||
In summary, there are several approaches to assessing and dealing with multicollinearity in a multiple regression model. Variance Inflation Factor (VIF) is one such approach, but there are several others, such as Condition Index (CI), Principal Component Analysis (PCA), Variance Decomposition (VD), and Generalized Variance Inflation Factor (GVIF). | In summary, there are several approaches to assessing and dealing with multicollinearity in a multiple regression model. Variance Inflation Factor (VIF) is one such approach, but there are several others, such as Condition Index (CI), Principal Component Analysis (PCA), Variance Decomposition (VD), and Generalized Variance Inflation Factor (GVIF). | ||
== | {{infobox5|list1={{i5link|a=[[Multicollinearity]]}} — {{i5link|a=[[Standardized regression coefficients]]}} — {{i5link|a=[[Residual standard deviation]]}} — {{i5link|a=[[Kurtosis]]}} — {{i5link|a=[[Analysis of covariance]]}} — {{i5link|a=[[Risk measures]]}} — {{i5link|a=[[Central tendency]]}} — {{i5link|a=[[Statistical significance]]}} — {{i5link|a=[[Measurement uncertainty]]}} }} | ||
==References== | |||
* Stine, R. A. (1995). ''[https://www.tandfonline.com/doi/pdf/10.1080/00031305.1995.10476113 Graphical interpretation of variance inflation factors]''. The American Statistician, 49(1), 53-56. | * Stine, R. A. (1995). ''[https://www.tandfonline.com/doi/pdf/10.1080/00031305.1995.10476113 Graphical interpretation of variance inflation factors]''. The American Statistician, 49(1), 53-56. | ||
[[Category:Statistics]] | [[Category:Statistics]] |
Revision as of 04:41, 18 November 2023
Variance inflation factor (VIF) is a statistical measure that assesses the degree of multicollinearity or correlation of independent variables in a multiple regression model. VIF estimates how much the variance of an estimated regression coefficient increases if the independent variables are inter-related. A VIF of 1 indicates no correlation; a VIF greater than 1 indicates a problematic level of multicollinearity, suggesting that the independent variables are not providing additional information beyond what is already provided by other independent variables in the model.
Example of variance inflation factor
- Let's say we have a multiple linear regression model with three independent variables, x1, x2, and x3, and a dependent variable, y. The variance inflation factor (VIF) for each independent variable can be calculated by taking the ratio of the variance of the estimated regression coefficient for each independent variable to the variance of the estimated regression coefficient when only the independent variable in question is included in the regression model. For example, if we calculate the VIF for x1, it would be the ratio of the variance of the estimated regression coefficient for x1 when all three independent variables are included in the regression model to the variance of the estimated regression coefficient for x1 when only x1 is included in the regression model. A VIF of 1 would indicate no correlation between the independent variables, while a VIF greater than 1 would indicate a problematic level of multicollinearity, suggesting that the independent variables are not providing additional information beyond what is already provided by other independent variables in the model.
Another example of variance inflation factor can be seen in the banking industry. When a bank is considering a loan application, they may use a multiple linear regression model with multiple independent variables to predict the likelihood of loan repayment. In this case, the VIF for each of the independent variables can be calculated to assess the degree of multicollinearity. For example, if the independent variables include credit score, income, and debt-to-income ratio, the VIF for each of these variables can be calculated to determine whether the independent variables are providing additional information beyond what is already provided by other independent variables. If the VIF for one or more of the independent variables is greater than 1, it may indicate a problematic level of multicollinearity, suggesting that the independent variables are not providing additional information beyond what is already provided by other independent variables in the model.
Formula of variance inflation factor
VIF is calculated as the ratio of the variance of an estimated regression coefficient to the variance of the same coefficient when all other independent variables are held constant:
$$\begin{align} VIF_j = \frac{Var(\hat{\beta_j})}{Var(\hat{\beta_j}|X_{-j})} \end{align}$$
where $$\hat{\beta_j}$$ is the estimated regression coefficient for the $$j^{th}$$ variable, and $$X_{-j}$$ is the set of all variables except the $$j^{th}$$ variable.
The variance of an estimated coefficient is calculated as:
$$\begin{align} Var(\hat{\beta_j})=\sigma^2 \left[ (X^TX)^{-1} \right]_{jj} \end{align}$$
where $$\sigma^2$$ is the variance of the error term, and $$X^TX$$ is the $$X$$ matrix multiplied by its transpose.
The variance of the estimated coefficient when all other variables are held constant is calculated as:
$$\begin{align} Var(\hat{\beta_j}|X_{-j})=\sigma^2 \left[ (X_{-j}^TX_{-j})^{-1} \right]_{jj} \end{align}$$
where $$X_{-j}$$ is the $$X$$ matrix without the $$j^{th}$$ column.
Finally, the VIF is calculated as the ratio of these two estimates:
$$\begin{align} VIF_j = \frac{Var(\hat{\beta_j})}{Var(\hat{\beta_j}|X_{-j})} = \frac{\left[ (X^TX)^{-1} \right]_{jj}}{\left[ (X_{-j}^TX_{-j})^{-1} \right]_{jj}} \end{align}$$
When to use variance inflation factor
VIF is a useful tool to assess the degree of multicollinearity in a multiple regression model. It can be used in the following situations:
- To identify redundant variables that can be removed from the model.
- To detect which independent variables are providing additional information beyond what is already provided by other independent variables in the model.
- To determine which independent variables are strongly correlated with each other.
- To help diagnose issues with model specification and to identify potential problems with the underlying data.
- To evaluate the assumptions of linear regression and to detect outliers.
- To help interpret the results of the regression analysis.
Types of variance inflation factor
The types of variance inflation factor (VIF) include:
- Global VIF - This measures the overall correlation between independent variables in the model.
- Conditional VIF - This measures the correlation between independent variables in the model, conditioned on other independent variables in the model.
- Local VIF - This measures the correlation between each independent variable and all other independent variables in the model.
- Partial VIF – This measures the correlation between each independent variable and all other independent variables in the model, while controlling for other independent variables in the model.
- Average VIF – This measures the average of all the individual VIFs, taking into account all the independent variables in the model.
Advantages of variance inflation factor
Variance inflation factor (VIF) is a measure of multicollinearity in a multiple regression model that assesses the degree of correlation between independent variables. There are several advantages to using VIF:
- VIF can help to identify and eliminate redundant variables from a model and simplify the model.
- VIF can help to identify and address the potential problems of multicollinearity in a model, which could lead to inaccurate estimates of regression coefficients.
- VIF provides a measure of the relative influence of each independent variable in the model.
- VIF can help identify potential interactions between variables that may not have been previously considered.
- VIF can be used to compare different models with different independent variables and determine which model is more reliable.
Limitations of variance inflation factor
Variance Inflation Factor (VIF) is a useful tool for assessing multicollinearity in a multiple regression model, but it has several limitations:
- VIF does not take into account nonlinear relationships between variables, only linear relationships.
- VIF does not provide information about how the multicollinearity affects the estimated coefficients of the model.
- VIF does not provide information about the underlying cause of multicollinearity.
- VIF is sensitive to outliers, so the results may be unreliable in the presence of outliers.
- VIF is also sensitive to small changes in the data, so the results can be affected by minor fluctuations in the data.
- VIF does not consider the relative importance of the variables in the model, so it can be difficult to determine which variables are causing the multicollinearity.
In addition to the Variance Inflation Factor (VIF), there are several other approaches to assessing multicollinearity in a multiple regression model. These include:
- Condition Index (CI): The CI is an index that measures the ratio of the maximum eigenvalue to the minimum eigenvalue. The CI is used to measure the severity of multicollinearity. The higher the CI, the greater the degree of multicollinearity.
- Principal Component Analysis (PCA): PCA is a method of dimensionality reduction that can be used to identify and remove variables with high multicollinearity.
- Variance Decomposition (VD): VD is a method of decomposing the variance of the dependent variable into components that are associated with each of the independent variables.
- Generalized Variance Inflation Factor (GVIF): GVIF is an extension of the VIF which takes into account the correlation between the independent variables and the errors in the model.
In summary, there are several approaches to assessing and dealing with multicollinearity in a multiple regression model. Variance Inflation Factor (VIF) is one such approach, but there are several others, such as Condition Index (CI), Principal Component Analysis (PCA), Variance Decomposition (VD), and Generalized Variance Inflation Factor (GVIF).
Variance inflation factor — recommended articles |
Multicollinearity — Standardized regression coefficients — Residual standard deviation — Kurtosis — Analysis of covariance — Risk measures — Central tendency — Statistical significance — Measurement uncertainty |
References
- Stine, R. A. (1995). Graphical interpretation of variance inflation factors. The American Statistician, 49(1), 53-56.