Mann-Whitney U test: Difference between revisions

From CEOpedia | Management online
No edit summary
mNo edit summary
 
(27 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The '''Mann-Whitney U test''' (also known as the "Wilcoxon rank-sum test" (WRS)) for independent samples tests whether the central tendencies of two independent samples are different. The Mann-Whitney U test is used when the requirements for a t-test for independent samples are not met. The question posed by the Mann-Whitney U test for independent samples is often abbreviated thus: "Do the central tendencies of two independent samples differ?"
The '''Mann-Whitney U test''' for independent samples tests whether the central tendencies of two independent samples are different. The Mann-Whitney U test is used when the requirements for a t-test for independent samples are not met. The question posed by the Mann-Whitney U test for independent samples is often abbreviated thus: '''"Do the central tendencies of two independent samples differ?"'''


== Differences to the T-Test ==
==Differences to the T-Test==
The Mann-Whitney U test is the nonparametric equivalent of the independent-samples t-test and is used when the conditions for a parametric procedure are not met. Non-parametric procedures are also known as "prerequisite-free procedures" because they have lower requirements for the distribution of the measured values in the population. For example, the data need not be normally distributed and the variables need only be ordinally scaled. A Mann-Whitney U test can also be calculated for small samples and outliers.<ref>UZH (2022) [https://www.methodenberatung.uzh.ch/de/datenanalyse_spss/unterschiede/zentral/mann.html#1.2._Voraussetzungen_des_Mann-Whitney-U-Tests Mann-Whtiney-U-Test.], online</ref>.
The Mann-Whitney U test is the '''nonparametric''' equivalent of the independent-samples t-test and is used when the conditions for a parametric procedure are not met. Non-parametric procedures are also known as "prerequisite-free procedures" because they have lower requirements for the distribution of the measured values in the population. For example, the data [[need]] not be normally distributed and the variables need only be ordinally scaled. A Mann-Whitney U test can also be calculated for small samples and outliers<ref>University of Zurich (2022)</ref>.


== Assumptions ==
==Assumptions for the Test==
* Null Hypothesis:
Null Hypothesis:
** The null hypothesis assumes that both groups under investigation are studied with the same population.  
* The null hypothesis assumes that both groups under investigation are studied with the same population.  
** The two independent groups must be homogeneous and have the same distribution.  
* The two independent groups must be homogeneous and have the same distribution.  


If a 2-sided test occurs, the alternative hypothesis T1, which is tested against the null hypothesis, is that the first population is different from the second population. In this case, the null hypothesis is rejected.
If a 2-sided test occurs, the alternative '''hypothesis T1''', which is tested against the null hypothesis, is that the first population is different from the second population. In this case, the null hypothesis is rejected.  


Assumptions:
Assumptions for the test<ref>Robert, M., Allaire, D. (1988)</ref><ref>Nachar, N. (2008), pp. 14-15</ref><ref>University of Zurich (2022)</ref>:
* The two groups studied must be drawn at random from the target population. The concept of randomness implies the absence of measurement and sampling error <ref>Robert, M., Allaire, D. (1988). [https://kolibris.univ-antilles.fr/discovery/fulldisplay/alma991000611799705746/33UAG_INST:33UAG_INST Fondements et étapes de la recherche scientifique en psychologie.]</ref>. Note that error of the latter types may be included but must remain small.
* The two groups studied must be drawn at random from the [[target population]]. The concept of randomness implies the [[absence]] of measurement and [[sampling]] error. Note that error of the latter types may be included but must remain small.
* Every Measurement must be for a differenct participant.
* Every Measurement must be for a differenct participant.
* The scale for data measurement is ordinal or continuous type. The observation values are then of ordinal, relative, or absolute scale type <ref>Nachar, N. (2008). [https://pdfs.semanticscholar.org/007b/c0936646c34abd369ceda930000c3d142228.pdf The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution.], pp. 14-15</ref>.
* The scale for data measurement is ordinal or continuous type. The observation values are then of ordinal, relative, or absolute scale type.
*There is an independent variable by means of which the two groups to be compared are formed <ref>UZH (2022). [https://www.methodenberatung.uzh.ch/de/datenanalyse_spss/unterschiede/zentral/mann.html#1.2._Voraussetzungen_des_Mann-Whitney-U-Tests Mann-Whtiney-U-Test.], online</ref>.
* There is an independent variable by means of which the two groups to be compared are formed.


== The Test ==
==The Test==
The Mann-Whitney U test first requires the calculation of a U statistic for each group. These statistics have a known distribution under the null hypothesis established by Mann and Whitney (1947)<ref>Mann, H., Whitney, D. (1947). [https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-18/issue-1/On-a-Test-of-Whether-one-of-Two-Random-Variables/10.1214/aoms/1177730491.full On a test of whether one of 2 random variables is stochastically larger than the other.]</ref>. Mathematically, the Mann-Whitney U statistic is defined as follows: <ref>Divine, G., Norton, H., Baron, A., & Juarez, E. (2017). [https://www.tandfonline.com/doi/pdf/10.1080/00031305.2017.1305291?needAccess=true The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians], p. 279</ref>
The Mann-Whitney U test first requires the calculation of a U statistic for each group. The statisticts for each group have a knwon distribtution proposed by Mann and Whitney(1947)<ref>Mann, H., Whitney, D. (1947)</ref>. In mathematically terms, the Mann-Whitney U statistic is defined as follows<ref>Divine, G., Norton, H., Baron, A., & Juarez, E. (2017), p. 279</ref>:


<math> U_x= n_x n_y + \frac{(n_x(n_x+ 1)}{2}) R_x</math>
<math> U_x= n_x n_y + \frac{(n_x(n_x+ 1)}{2}) - R_x</math>


And therefore for 2 Groups:
And therefore for 2 Groups:


<math> U_1= n_1 n_2 + \frac{(n_1(n_1+ 1)}{2}) R_1</math>
<math> U_1= n_1 n_2 + \frac{(n_1(n_1+ 1)}{2}) - R_1</math>


And:
And:


<math> U_2= n_1 n_2 + \frac{(n_2(n_2+ 1)}{2}) R_2</math>
<math> U_2= n_1 n_2 + \frac{(n_2(n_2+ 1)}{2}) - R_2</math>


N<sub>x</sub> is defined as the number of observations/participants in the first group, n<sub>y</sub> the number of observations/participants in the second group, R<sub>x</sub> the sum of the ranks of the individual groups. After calculating the U statistic and determining an appropriate statistical threshold (α), the null hypothesis may or may not be rejected.
N<sub>x</sub> is defined as the number of observations/participants in the first group, n<sub>y</sub> the number of observations/participants in the second group, R<sub>x</sub> the sum of the ranks of the individual groups. After calculating the U statistic and determining an appropriate statistical threshold (α), the null hypothesis may or may not be rejected.


H<sub>0</sub> is rejected if, according to Mann and Whitney's tables, the p corresponding to min (U<sub>x</sub>,U<sub>y</sub>) (the smallest of the two calculated U) is smaller than the p or the specified α-threshold. In technical terms, reject H<sub>0</sub> if p of min (U<sub>x</sub>,U<sub>y</sub>) <α threshold<ref>Nachar, N. (2008). [https://pdfs.semanticscholar.org/007b/c0936646c34abd369ceda930000c3d142228.pdf The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution.], pp. 14-17</ref>.
H<sub>0</sub> is rejected if, according to Mann and Whitney's tables, the p corresponding to min (U<sub>x</sub>,U<sub>y</sub>) (the smallest of the two calculated U) is smaller than the p or the specified α-threshold. Technically, reject H<sub>0</sub> if p of min (U<sub>x</sub>,U<sub>y</sub>) <α threshold<ref>Nachar, N. (2008), pp. 14-17</ref>.


== Example ==
==Example of U-Test==
20 patients of a hospital are examined. 12 of them are under cardiological treatment, while 8 are not. They all answer a questionnaire on general well-being (scores from 0 to 35, 0 representing very high, 35 very low well-being). The aim is to test whether there are differences in terms of central tendency of well-being between the cardiac patients and the other patients. The dataset to be analyzed contains, in addition to the subject number (ID), the grouping variable (Group), which takes the value 1 for cardiac patients and 2 for other patients, and the well-being value (Data).
20 patients of a hospital are examined. 12 of them are under cardiological treatment, while 8 are not. They all answer a questionnaire on general well-being (scores from 0 to 35, 0 representing very high, 35 very low well-being). The aim is to test whether there are differences in terms of [[central tendency]] of well-being between the cardiac patients and the other patients. The dataset to be analyzed contains, in addition to the subject number (ID), the grouping variable (Group), which takes the value 1 for cardiac patients and 2 for other patients, and the well-being value (Data).


The Mann-Whitney U test is based on the idea of ranking the data. That is, it is not calculated with the measured values themselves, but these are replaced by ranks with which the actual test is performed. Thus, the calculation of the test is based exclusively on the ordering of the data (greater than, less than). The absolute distances between the values are not taken into account.
The Mann-Whitney U test works by ranking the data. That is, it is not calculated with the measured values themselves, but these are replaced by ranks with which the actual test is performed. Thus, the calculation of the test is based exclusively on the ordering of the data (greater than, less than). The absolute distances between the values are not taken into account.


In the first step, the measured values are ranked according to their magnitude (to be seen in the Well-being column). This ranking is independent of group membership. Subsequently, the measured values are ranked (starting from 1 and ascending), whereby a distinction is made here between the groups. If the same measured value occurs several times, the mean value is formed in so-called "linked ranks". Afterwards, rank sums are formed for the two groups by adding up the ranks within the groups.
In the first step, the measured values are ranked according to their magnitude (to be seen in the Well-being column). This ranking is independent of group membership. Subsequently, the measured values are ranked (starting from 1 and ascending), whereby a distinction is made here between the groups. If the same measured value occurs several times, the mean value is formed in so-called "linked ranks". Afterwards, rank sums are formed for the two groups by adding up the ranks within the groups.
Line 47: Line 47:
| 5 || 1 || 0 || 1 ||  
| 5 || 1 || 0 || 1 ||  
|-
|-
| 6 || 2 || 1 || || 2
| 6 || 2 || 1 || || 2
|-
|-
| 14 || 2 || 2 || || 3
| 14 || 2 || 2 || || 3
|-
|-
| 9 || 2 || 3 || || 4
| 9 || 2 || 3 || || 4
|-
|-
| 18 || 2 || 4 || || 5
| 18 || 2 || 4 || || 5
|-
|-
| 10 || 1 || 5 || 6 ||  
| 10 || 1 || 5 || 6 ||  
Line 59: Line 59:
| 19 || 1 || 5.5 || 7 ||  
| 19 || 1 || 5.5 || 7 ||  
|-
|-
| 1 || 2 || 6 || || 8
| 1 || 2 || 6 || || 8
|-
|-
| 8 || 2 || 6.5 || || 9
| 8 || 2 || 6.5 || || 9
|-
|-
| 17 || 1 || 7 || 10 ||  
| 17 || 1 || 7 || 10 ||  
|-
|-
| 15 || 2 || 7.5 || || 11
| 15 || 2 || 7.5 || || 11
|-
|-
| 11 || 1 || 8 || 12 ||  
| 11 || 1 || 8 || 12 ||  
|-
|-
| 3 || 2 || 8.5 || || 13
| 3 || 2 || 8.5 || || 13
|-
|-
| 2 || 1 || 9 || 14 ||  
| 2 || 1 || 9 || 14 ||  
Line 85: Line 85:
| 13 || 1 || 33 || 20 ||  
| 13 || 1 || 33 || 20 ||  
|-
|-
! | Rangingsummary || || || 155 || 55
! | Rangingsummary || || || 155 || 55
|}
|}


For group 1 the rank sum is 155 (n=12), for group 2 55 (n=8). To calculate U, the larger of the two rank sums is used.
For group 1 the rank sum is 155 (n=12), for group 2 55 (n=8). To calculate U, the larger of the two rank sums is used.


<math>U = n_1n_2+ \frac{n_1(n_1+1)}{2} -R_1</math>
<math>U = n_1n_2+ \frac{n_1(n_1+1)}{2} - R_1</math>


n<sub>1</sub> = Sample size of the group with the larger rank sum.
n<sub>1</sub> = Sample size of the group with the larger rank sum.
Line 97: Line 97:
Thus, it follows:  
Thus, it follows:  


<math>U = 12*8+ \frac{12(12+1)}{2} -155 = 19</math>
<math>U = 12*8+ \frac{12(12+1)}{2} - 155 = 19</math>


=== Significance ===
===Testing of Significance===
If the sample size is large enough (n<sub>1</sub>+n<sub>2</sub> > 30), significance can be tested. Here z is calculated:  
If the sample size is large enough (n<sub>1</sub>+n<sub>2</sub> > 30), significance can be tested. Here z is calculated:  


<math>z = \frac{U-μ_U}{σ_U}=\frac{U-\frac{n_1*n_2}{2}}{\sqrt{\frac{n_1*n_2(n_1+n_2+1)}{12}}}</math>
<math>z = \frac{U-\mu_U}{\sigma_U}=\frac{U-\frac{n_1*n_2}{2}}{\sqrt{\frac{n_1*n_2(n_1+n_2+1)}{12}}}</math>


μ= mean of the U-distribution (U-value, without difference between groups).
μ= mean of the U-distribution (U-value, without difference between groups).


σ= Standard Error of the U-Value
σ= [[Standard]] Error of the U-Value


n<sub>1</sub>= sample size of the group with the larger rank sum
n<sub>1</sub>= sample size of the group with the larger rank sum
Line 112: Line 112:
n<sub>2</sub>= sample size of the group with the smaller rank sum
n<sub>2</sub>= sample size of the group with the smaller rank sum


<math>z = \frac{U-μ_U}{σ_U}=\frac{19-\frac{12*8}{2}}{\sqrt{\frac{12*8(12+8+1)}{12}}}</math>
<math>z = \frac{U-\mu_U}{\sigma_U}=\frac{19-\frac{12*8}{2}}{\sqrt{\frac{12*8(12+8+1)}{12}}}</math>


This z-value is now compared with the critical value of the standard normal distribution. For the two-sided significance level .05, it is ±1.96. If the magnitude of the test statistic is higher than the critical value, the difference between the two groups is significant <ref>UZH (2022). [https://www.methodenberatung.uzh.ch/de/datenanalyse_spss/unterschiede/zentral/mann.html#1.2._Voraussetzungen_des_Mann-Whitney-U-Tests Mann-Whtiney-U-Test.], online</ref>.
This z-value is now compared with the critical value of the standard [[normal distribution]]. For the two-sided significance level .05, it is ±1.96. If the magnitude of the test statistic is higher than the critical value, the difference between the two groups is significant <ref>University of Zurich (2022)</ref>.


==Footnotes==
==Footnotes==
<references/>
<references/>
{{infobox5|list1={{i5link|a=[[Confidence level]]}} &mdash; {{i5link|a=[[Adjusted mean]]}} &mdash; {{i5link|a=[[Interval scale]]}} &mdash; {{i5link|a=[[Nominal scale]]}} &mdash; {{i5link|a=[[Anderson darling normality test]]}} &mdash; {{i5link|a=[[Statistical significance]]}} &mdash; {{i5link|a=[[Two-way ANOVA]]}} &mdash; {{i5link|a=[[Monte carlo method]]}} &mdash; {{i5link|a=[[Random error]]}} }}


==References==
==References==
* Divine, G., Norton, H., Baron, A., & Juarez, E. (2017). [https://www.tandfonline.com/doi/pdf/10.1080/00031305.2017.1305291?needAccess=true The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians]. The American Statistician. 72. 10.1080/00031305.2017.1305291.
* Divine, G., Norton, H., Baron, A., & Juarez, E. (2017). [https://www.tandfonline.com/doi/pdf/10.1080/00031305.2017.1305291?needAccess=true ''The Wilcoxon-Mann-Whitney Procedure Fails as a Test of Medians''], The American Statistician, 72. 10.1080/00031305.2017.1305291
* Mann, H., Whitney, D. (1947). [https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-18/issue-1/On-a-Test-of-Whether-one-of-Two-Random-Variables/10.1214/aoms/1177730491.full On a test of whether one of 2 random variables is stochastically larger than the other.] Annals of Mathematical Statistics, 18, 50‐60.
* Mann, H., Whitney, D. (1947). [https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-18/issue-1/On-a-Test-of-Whether-one-of-Two-Random-Variables/10.1214/aoms/1177730491.full ''On a test of whether one of 2 random variables is stochastically larger than the other.''], Annals of Mathematical Statistics, 18, 50‐60
* Nachar, N. (2008). [https://pdfs.semanticscholar.org/007b/c0936646c34abd369ceda930000c3d142228.pdf The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution.] Tutorials in Quantitative Methods for Psychology. 13-17
* Nachar, N. (2008). [https://pdfs.semanticscholar.org/007b/c0936646c34abd369ceda930000c3d142228.pdf ''The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution.''], Tutorials in Quantitative Methods for Psychology, 13-17
* Robert, M., Allaire, D. (1988). [https://kolibris.univ-antilles.fr/discovery/fulldisplay/alma991000611799705746/33UAG_INST:33UAG_INST Fondements et étapes de la recherche scientifique en psychologie.] Saint‐Hyacinthe : Edisem et Paris : Maloine. Sedlmeier, P., & Gigerenzer.
* Robert, M., Allaire, D. (1988). [https://kolibris.univ-antilles.fr/discovery/fulldisplay/alma991000611799705746/33UAG_INST:33UAG_INST ''Fondements et étapes de la recherche scientifique en psychologie.''], Saint‐Hyacinthe : Edisem et Paris : Maloine. Sedlmeier, P., & Gigerenzer
* UZH (2022). [https://www.methodenberatung.uzh.ch/de/datenanalyse_spss/unterschiede/zentral/mann.html#1.2._Voraussetzungen_des_Mann-Whitney-U-Tests Mann-Whtiney-U-Test.] University Zurich.
* [https://www.methodenberatung.uzh.ch/de/datenanalyse_spss/unterschiede/zentral/mann.html#1.2._Voraussetzungen_des_Mann-Whitney-U-Tests ''Mann-Whtiney-U-Test.''] (2022), University of Zurich


{{a|Sven Korten}}
{{a|Sven Korten}}
[[Category:Statistics]]
[[Category:Statistics]]

Latest revision as of 09:50, 18 November 2023

The Mann-Whitney U test for independent samples tests whether the central tendencies of two independent samples are different. The Mann-Whitney U test is used when the requirements for a t-test for independent samples are not met. The question posed by the Mann-Whitney U test for independent samples is often abbreviated thus: "Do the central tendencies of two independent samples differ?"

Differences to the T-Test

The Mann-Whitney U test is the nonparametric equivalent of the independent-samples t-test and is used when the conditions for a parametric procedure are not met. Non-parametric procedures are also known as "prerequisite-free procedures" because they have lower requirements for the distribution of the measured values in the population. For example, the data need not be normally distributed and the variables need only be ordinally scaled. A Mann-Whitney U test can also be calculated for small samples and outliers[1].

Assumptions for the Test

Null Hypothesis:

  • The null hypothesis assumes that both groups under investigation are studied with the same population.
  • The two independent groups must be homogeneous and have the same distribution.

If a 2-sided test occurs, the alternative hypothesis T1, which is tested against the null hypothesis, is that the first population is different from the second population. In this case, the null hypothesis is rejected.

Assumptions for the test[2][3][4]:

  • The two groups studied must be drawn at random from the target population. The concept of randomness implies the absence of measurement and sampling error. Note that error of the latter types may be included but must remain small.
  • Every Measurement must be for a differenct participant.
  • The scale for data measurement is ordinal or continuous type. The observation values are then of ordinal, relative, or absolute scale type.
  • There is an independent variable by means of which the two groups to be compared are formed.

The Test

The Mann-Whitney U test first requires the calculation of a U statistic for each group. The statisticts for each group have a knwon distribtution proposed by Mann and Whitney(1947)[5]. In mathematically terms, the Mann-Whitney U statistic is defined as follows[6]:

And therefore for 2 Groups:

And:

Nx is defined as the number of observations/participants in the first group, ny the number of observations/participants in the second group, Rx the sum of the ranks of the individual groups. After calculating the U statistic and determining an appropriate statistical threshold (α), the null hypothesis may or may not be rejected.

H0 is rejected if, according to Mann and Whitney's tables, the p corresponding to min (Ux,Uy) (the smallest of the two calculated U) is smaller than the p or the specified α-threshold. Technically, reject H0 if p of min (Ux,Uy) <α threshold[7].

Example of U-Test

20 patients of a hospital are examined. 12 of them are under cardiological treatment, while 8 are not. They all answer a questionnaire on general well-being (scores from 0 to 35, 0 representing very high, 35 very low well-being). The aim is to test whether there are differences in terms of central tendency of well-being between the cardiac patients and the other patients. The dataset to be analyzed contains, in addition to the subject number (ID), the grouping variable (Group), which takes the value 1 for cardiac patients and 2 for other patients, and the well-being value (Data).

The Mann-Whitney U test works by ranking the data. That is, it is not calculated with the measured values themselves, but these are replaced by ranks with which the actual test is performed. Thus, the calculation of the test is based exclusively on the ordering of the data (greater than, less than). The absolute distances between the values are not taken into account.

In the first step, the measured values are ranked according to their magnitude (to be seen in the Well-being column). This ranking is independent of group membership. Subsequently, the measured values are ranked (starting from 1 and ascending), whereby a distinction is made here between the groups. If the same measured value occurs several times, the mean value is formed in so-called "linked ranks". Afterwards, rank sums are formed for the two groups by adding up the ranks within the groups.

Example Data
ID Group General well-being Ranking Group 1 Ranking Group 2
5 1 0 1
6 2 1 2
14 2 2 3
9 2 3 4
18 2 4 5
10 1 5 6
19 1 5.5 7
1 2 6 8
8 2 6.5 9
17 1 7 10
15 2 7.5 11
11 1 8 12
3 2 8.5 13
2 1 9 14
20 1 11 15
12 1 13 16
16 1 28 17
4 1 29 18
7 1 32 19
13 1 33 20
Rangingsummary 155 55

For group 1 the rank sum is 155 (n=12), for group 2 55 (n=8). To calculate U, the larger of the two rank sums is used.

n1 = Sample size of the group with the larger rank sum. n2 = Sample size of the group with the larger rank sum. R1 = Larger rank sum Thus, it follows:

Testing of Significance

If the sample size is large enough (n1+n2 > 30), significance can be tested. Here z is calculated:

μ= mean of the U-distribution (U-value, without difference between groups).

σ= Standard Error of the U-Value

n1= sample size of the group with the larger rank sum

n2= sample size of the group with the smaller rank sum

This z-value is now compared with the critical value of the standard normal distribution. For the two-sided significance level .05, it is ±1.96. If the magnitude of the test statistic is higher than the critical value, the difference between the two groups is significant [8].

Footnotes

  1. University of Zurich (2022)
  2. Robert, M., Allaire, D. (1988)
  3. Nachar, N. (2008), pp. 14-15
  4. University of Zurich (2022)
  5. Mann, H., Whitney, D. (1947)
  6. Divine, G., Norton, H., Baron, A., & Juarez, E. (2017), p. 279
  7. Nachar, N. (2008), pp. 14-17
  8. University of Zurich (2022)


Mann-Whitney U testrecommended articles
Confidence levelAdjusted meanInterval scaleNominal scaleAnderson darling normality testStatistical significanceTwo-way ANOVAMonte carlo methodRandom error

References

Author: Sven Korten