Heteroskedasticity: Difference between revisions

From CEOpedia | Management online
Line 46: Line 46:
In general, adds squares and interaction terms to catch all interdependence between the variance of residuals and the independent variables. Easier and fewer degrees of freedom: Use fitted valuers and their squared form. (het.test in R)
In general, adds squares and interaction terms to catch all interdependence between the variance of residuals and the independent variables. Easier and fewer degrees of freedom: Use fitted valuers and their squared form. (het.test in R)
In both cases, the test's null hypothesis is that your residuals are homoscedastic, and your alternative hypothesis maintains the opposite. If your p-value is lower than your significance level, then there is heteroskedasticity.  
In both cases, the test's null hypothesis is that your residuals are homoscedastic, and your alternative hypothesis maintains the opposite. If your p-value is lower than your significance level, then there is heteroskedasticity.  
H0: Residuals are homoscedastic  
H0: Residuals are homoscedastic.
Ha: Residuals are not homoscedastic (Heteroscedasticity is present)
Ha: Residuals are not homoscedastic (Heteroscedasticity is present).
If the null hypothesis is rejected, then your residuals are heteroscedastic.  
If the null hypothesis is rejected, then your residuals are heteroscedastic.  
If you fail to reject the null hypothesis the residuals are homoscedastic.
If you fail to reject the null hypothesis the residuals are homoscedastic.

Revision as of 15:02, 1 November 2022

Heteroscedasticity is the case if homoscedasticity is not fulfilled, which is one of the most important assumptions of the Ordinary Least Squares Regression (OLS). One of the assumptions of the OLS regression is that the errors are normally and independently distributed. The assumption regarding the OLS regression assumes that the variance of the error terms stays constant over periods.

[1]

Definition of Heteroscedasticity

Heteroskedasticity is defined as the residuals that don't have the same variances in the model. That means that the difference in the true values of the residuals is not the same in every period. This causes the variance of the errors to depend on the independent variables, which causes an error om the variance of the OLS estimators and therefore in their standard errors. Failed to parse (unknown function "\Var"): {\displaystyle \Var(\epsilon)=\sigma_i^2≠\sigma^2} [2]

Consequences of heteroscedasticity

If you run your regression under the fact that there is heteroscedasticity you get unbiased values for your beta coefficients. That means there is no correlation between the explanatory variable and the residual. So, consistency and unbiasedness are still given if only the homoscedasticity assumption is violated. Overall, there is no impact on the model fit. But you get an impact on other parts:

  • The estimates of your coefficients are not efficient anymore
  • The standard errors are biased as the test statistics

Due to wrong standard errors, our T-statistic is wrong, and we make any valid statement about their significance. For example, if the standard errors will be too small then it’s more unlikely to reject the null hypothesis. Thus, the inference, as well as efficiency, are affected. The results won’t be efficient anymore because they don’t have the minimum variance anymore. It’s very important to correct heteroskedasticity to get a useful interpretation of your model and to have a correct interpretation of statistical test decisions.

[3] [4]

Reasons for Heteroscedasticity

Heteroscedasticity is often found in time series data or cross-sectional data. Reasons can be omitted variables, outliers in data, or incorrectly specified model equations.

How to find out if there is heteroskedasticity?

In doubt, you should adopt that in your regression is heteroscedasticity and check if it is true or not regarding the reality.

Plot in R

To find out if there is heteroskedasticity, there exist several ways. The fastest way is with the use of statistical programs for example R studio. The plot shows you the residuals against the fitted values on the graphic. If there is any kind of trend or pattern, then it is very likely that your assumption of the OLS model is violated and there is heteroscedasticity. If you have a random distribution of your values your assumption is not violated.

Breusch-Pagan Test

There exists also the possibility to run the Breusch-Pagan Test to identify if there is heteroscedasticity. (In R the lmtest package.) This test checks whether our independent variables affect the error terms by regressing the squared residuals (an easier approximation to the Variance of u) on our regressors and checking the significance.

White Test

This test is more general and does not only test for homoscedasticity. In general, adds squares and interaction terms to catch all interdependence between the variance of residuals and the independent variables. Easier and fewer degrees of freedom: Use fitted valuers and their squared form. (het.test in R) In both cases, the test's null hypothesis is that your residuals are homoscedastic, and your alternative hypothesis maintains the opposite. If your p-value is lower than your significance level, then there is heteroskedasticity. H0: Residuals are homoscedastic. Ha: Residuals are not homoscedastic (Heteroscedasticity is present). If the null hypothesis is rejected, then your residuals are heteroscedastic. If you fail to reject the null hypothesis the residuals are homoscedastic.

Goldfeld-Quant Test

The main fact is to compare two variances of two subsamples here. Then you run two different regressions for the groups. The null hypothesis says that those two groups have the same variance, which means there is homoscedasticity. If your results differ then there exists heteroscedasticity. You can also use the Levene-Test, Glejser-Test, or the RESET-Test [5]

What to do against heteroskedasticity?

In order to get correct for heteroscedasticity several approaches can be found in the literature.

One possibility is to change to a WLS regression, the so-called weighted least squares regression. That means that you use weights based on the variance. The choice of your weights depends on the structure of the data. For this solution, you need to know the error variance of every observation. Very often the size of the variances is unknown which makes this approach impractical.

Another alternative is to redesign the model. That means a transformation of your dependent variable and the data. Then you try to stabilize the variance. One example is to take your values quadratic.

Moreover literature shows how to solve it with bootstraping. An advantage is there are no strong assumptions necessary. It got a popular alternative for calculating p-values and confidence intervals if assumptions are violated. You do that by constructing a data-generating process based on unknown parameters and probability distributions. [6]

Footnotes

  1. Wooldrige, J. (2005). pg.13
  2. Wooldrige, J. (2005). pg.13
  3. Astivia, O., & Zumbo B. (2019), pg. 2-4
  4. Kaufmann, R.(2013) pg. 2-5
  5. Astivia, O., & Zumbo B. (2019), pg.4-7
  6. Astivia, O., & Zumbo B. (2019), pg.34

References

Author: Annamarie Dietz