Multiple regression analysis

Multiple regression analysis
See also

Multiple regression analysis is a method of valuing connection between two or more independent and one dependent variables[1].

The most common model used in multiple regression analysis is linear regression model. In mathematical terms, performing regression analysis of this model is finding "coefficient of multiple correlation (R) that defines the amount of linear correlation in between the dependent variable y and the independent variables \(x_1, x_2,…x_n\)"[2].

Linear multiple regression model

Relation between dependent variable y and n independent variables \(x_1, x_2,...x_n\) can be expressed as linear regression model\[y = β_0 + β_1 x_1 + β_2 x_2 + ... + β_n x_n + ε\]

where \(ε\) is residual factor, \(β_k (k = 0, 1, 2, ..., n)\) are regression factors (coefficients). \(β_0\) is a constant called regression intercept, while \(β_1, β_2, ...\) are regression slope parameters [3].

The goal of multiple regression analysis is finding all factors \(β_k\) in above equation.

Calculating regression coefficients

In linear models calculating regression coefficients means finding linear function (in case of two dependents - represented by straight line) that fits given data set best. Depending on what "best fit" is described as (in statistics this problem is called goodness of fit), different methods can be used to perform regression.

The most popular method of finding regression coefficients of linear models is ordinary least squares method. It minimizes the sum of squared distances of all of the points from the data set to regression surface. It's popularity derives from effectiveness and ease of calculations[4].

It must be noted that correlations between independent variables of the model are possible (and sometimes they are hard to determine before analysis). To determine connections between them, analysis of dependencies of correlations must be performed. If one correlations between any two variables is high, one of them must be eliminated from model[5].

Nonlinear regression

When model is nonlinear, regression must be performed by iterative procedure. Nonlinear regression analysis aims to find best nonlinear function that fits given data set. With two dependents, this function is a curve. To find (nonlinear) coefficients of this model, usually numerical optimization algorithms are used. When dependent value has a constant variance, ordinary least squares method may be used to minimize sum of squared residuals. Otherwise, weighted least squares method that aims to minimize sum of weighted squared residuals.

Sometimes, nonlinear models are transformed to linear domain, making analysis linear, thus much easier to perform (as it does not require iterative optimization). This transformation changes influences of data values and distribution of errors in model, so it must be used with caution and preceded with careful data examination [6].



  1. Lefter C. 2004, p. 364
  2. Shyti B., Isa I., Paralloi S. 2017, p. 301
  3. Anghelache C., et al. 2013, p. 134
  4. Alma Ö.G. 2011, p. 409-411
  5. Kulcsár E. 2009, p. 63
  6. Oosterbaan R.J. 2002, p. 33

Author: Karolina Próchniak