Multiple regression analysis
|Multiple regression analysis|
The most common model used in multiple regression analysis is linear regression model. In mathematical terms, performing regression analysis of this model is finding "coefficient of multiple correlation (R) that defines the amount of linear correlation in between the dependent variable y and the independent variables \(x_1, x_2,…x_n\)".
Linear multiple regression model
Relation between dependent variable y and n independent variables \(x_1, x_2,...x_n\) can be expressed as linear regression model\[y = β_0 + β_1 x_1 + β_2 x_2 + ... + β_n x_n + ε\]
where \(ε\) is residual factor, \(β_k (k = 0, 1, 2, ..., n)\) are regression factors (coefficients). \(β_0\) is a constant called regression intercept, while \(β_1, β_2, ...\) are regression slope parameters .
The goal of multiple regression analysis is finding all factors \(β_k\) in above equation.
Calculating regression coefficients
In linear models calculating regression coefficients means finding linear function (in case of two dependents - represented by straight line) that fits given data set best. Depending on what "best fit" is described as (in statistics this problem is called goodness of fit), different methods can be used to perform regression.
The most popular method of finding regression coefficients of linear models is ordinary least squares method. It minimizes the sum of squared distances of all of the points from the data set to regression surface. It's popularity derives from effectiveness and ease of calculations.
It must be noted that correlations between independent variables of the model are possible (and sometimes they are hard to determine before analysis). To determine connections between them, analysis of dependencies of correlations must be performed. If one correlations between any two variables is high, one of them must be eliminated from model.
When model is nonlinear, regression must be performed by iterative procedure. Nonlinear regression analysis aims to find best nonlinear function that fits given data set. With two dependents, this function is a curve. To find (nonlinear) coefficients of this model, usually numerical optimization algorithms are used. When dependent value has a constant variance, ordinary least squares method may be used to minimize sum of squared residuals. Otherwise, weighted least squares method that aims to minimize sum of weighted squared residuals.
Sometimes, nonlinear models are transformed to linear domain, making analysis linear, thus much easier to perform (as it does not require iterative optimization). This transformation changes influences of data values and distribution of errors in model, so it must be used with caution and preceded with careful data examination .
- Alma Ö.G. (2011), Comparison of Robust Regression Methods in Linear Regression, "International Journal of Contemporary Mathematical Sciences", vol. 6
- Anghelache C., et al. (2013), Multiple Regression Used in Macro-economic Analysis, Revista Română de Statistică - Supplement Trim II/2013
- Kulcsár E. (2009), Multiple Regression Analysis of Main Economic Indicators in Tourism, "Revista de turism-studii si cercetari in turism"
- Lefter C. (2004), Marketing Researches, Infomarket, Brasov
- Oosterbaan R.J. (2002), Drainage research in farmers' fields: analysis of data. Part of project “Liquid Gold” of the International Institute for Land Reclamation and Improvement (ILRI), International Institute for Land Reclamation and Improvement, Wageningen
- Shyti B., Isa I., Paralloi S. (2017), Multiple Regressions for the Financial Analysis of Alabanian Economy, "Academic Journal of Interdisciplinary Studies", vol. 5
- Lefter C. 2004, p. 364
- Shyti B., Isa I., Paralloi S. 2017, p. 301
- Anghelache C., et al. 2013, p. 134
- Alma Ö.G. 2011, p. 409-411
- Kulcsár E. 2009, p. 63
- Oosterbaan R.J. 2002, p. 33
Author: Karolina Próchniak