Heteroskedasticity: Difference between revisions
No edit summary |
|||
Line 78: | Line 78: | ||
It got a popular alternative for calculating p-values and confidence intervals if assumptions are violated. You do that by constructing a data-generating process based on unknown parameters and probability distributions. | It got a popular alternative for calculating p-values and confidence intervals if assumptions are violated. You do that by constructing a data-generating process based on unknown parameters and probability distributions. | ||
==References | ==References== | ||
*Astivia, O., & Zumbo B. (2019). ''What it is, How to Detect it and How to Solve it with Applications in R and SPSS," Practical Assessment, Research, and Evaluation:'' Vol. 24, Article 1 | *Astivia, O., & Zumbo B. (2019). ''What it is, How to Detect it and How to Solve it with Applications in R and SPSS," Practical Assessment, Research, and Evaluation:'' Vol. 24, Article 1 | ||
Revision as of 08:59, 27 October 2022
Heteroscedasticity is the case if homoscedasticity is not fulfilled, which is one of the most important assumptions of the ordinary least squares (OLS) regression.
One of the assumptions of the OLS regression is that the errors are normally and independently distributed. The assumption regarding the OLS regression assumes that the variance of the error terms stays constant over periods.
Definition of Heteroscedasticity
Heteroskedasticity is defined as the residuals that don't have the same variances in the model. That means that the difference in the true values of the residuals is not the same in every period. This causes the variance of the errors to depend on the independent variables, which causes an error om the variance of the OLS estimators and therefore in their standard errors. Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Var(\epsilon)=\sigma_i^2≠\sigma^2}
Consequences of heteroscedasticity
If you run your regression under the fact that there is heteroscedasticity you get unbiased values for your beta coefficients. That means there is no correlation between the explanatory variable and the residual. So, consistency and unbiasedness are still given if only the homoscedasticity assumption is violated. Overall, there is no impact on the model fit.
But you get an impact on other parts:
- The estimates of your coefficients are not efficient anymore
- The standard errors are biased as the test statistics
Due to wrong standard errors, our t-statistic is wrong, and we make any valid statement about their significance. For example, if the standard errors will be too small then it’s more unlikely to reject the null hypothesis. Thus, the inference, as well as efficiency, are affected. The results won’t be efficient anymore because they don’t have the minimum variance anymore. It’s very important to correct heteroskedasticity to get a useful interpretation of your model and to have a correct interpretation of statistical test decisions.
Reasons for Heteroscedasticity
Heteroscedasticity is often found in time series data or cross-sectional data. Reasons can be omitted variables, outliers in data, or incorrectly specified model equations.
How to find out if there is heteroskedasticity?
In doubt, you should adopt that in your regression is heteroscedasticity and check if it is true or not regarding the reality.
- Plot in R
To find out if there is heteroskedasticity, there exist several ways. The fastest way is with the use of statistical programs for example R studio. The plot shows you the residuals against the fitted values on the graphic.
If there is any kind of trend or pattern, then it is very likely that your assumption of the OLS model is violated and there is heteroscedasticity. If you have a random distribution of your values your assumption is not violated.
(2 GRAPHICS MISSING) Those two graphics illustrate the difference between homoscedasticity and heteroscedasticity. In the first illustration, there is a random distribution, no trend, and the errors are independent, that is the case of homoscedasticity. The second illustration shows a trend, which implies that it is heteroscedasticity.
- Breusch-Pagan Test
There exists also the possibility to run the Breusch-Pagan Test to identify if there is heteroscedasticity. (In R the lmtest package.) This test checks whether our independent variables affect the error terms by regressing the squared residuals (an easier approximation to the Variance of u) on our regressors and checking the significance.
- White Test
This test is more general and does not only test for homoscedasticity. In general, adds squares and interaction terms to catch all interdependence between the variance of residuals and the independent variables. Easier and fewer degrees of freedom: Use fitted valuers and their squared form. (het.test in R)
In both cases, the test's null hypothesis is that your residuals are homoscedastic, and your alternative hypothesis maintains the opposite. If your p-value is lower than your significance level, then there is heteroskedasticity.
H0: Residuals are homoscedastic Ha: Residuals are not homoscedastic (Heteroscedasticity is present)
If the null hypothesis is rejected, then your residuals are heteroscedastic. If you fail to reject the null hypothesis the residuals are homoscedastic.
- Goldfeld-Quant Test
The main fact is to compare two variances of two subsamples here. Then you run two different regressions for the groups. The null hypothesis says that those two groups have the same variance, which means there is homoscedasticity. If your results differ then there exists heteroscedasticity.
You can also use the Levene-Test, Glejser-Test, or the RESET-Test
What to do against heteroskedasticity?
In order to get correct for heteroscedasticity several approaches can be found in the literature.
- Change to a WLS regression
You could change to a WLS regression, the so-called weighted least squares regression. That means that you use weights based on the variance. The choice of your weights depends on the structure of the data. For this solution, you need to know the error variance of every observation. Very often the size of the variances is unknown which makes this approach impractical.
- Redesigning the model
Another alternative is to transform your dependent variable and the data. Then you try to stabilize the variance. One example is to take your values quadratic.
- Bootstrapping
To fix it with bootstrap. An advantage is there are no strong assumptions necessary. It got a popular alternative for calculating p-values and confidence intervals if assumptions are violated. You do that by constructing a data-generating process based on unknown parameters and probability distributions.
References
- Astivia, O., & Zumbo B. (2019). What it is, How to Detect it and How to Solve it with Applications in R and SPSS," Practical Assessment, Research, and Evaluation: Vol. 24, Article 1
- Kaufman, R. (2013). Heteroskedasticity in Regression: Detection and Correction
- Wooldrige, J. (2005). Introductory Econometrics: A Modern Approach
Author: Annamarie Dietz