Mann-Whitney U test: Difference between revisions
No edit summary |
mNo edit summary |
||
(23 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
The '''Mann-Whitney U test''' | The '''Mann-Whitney U test''' for independent samples tests whether the central tendencies of two independent samples are different. The Mann-Whitney U test is used when the requirements for a t-test for independent samples are not met. The question posed by the Mann-Whitney U test for independent samples is often abbreviated thus: '''"Do the central tendencies of two independent samples differ?"''' | ||
== Differences to the T-Test == | ==Differences to the T-Test== | ||
The Mann-Whitney U test is the nonparametric equivalent of the independent-samples t-test and is used when the conditions for a parametric procedure are not met. Non-parametric procedures are also known as "prerequisite-free procedures" because they have lower requirements for the distribution of the measured values in the population. For example, the data need not be normally distributed and the variables need only be ordinally scaled. A Mann-Whitney U test can also be calculated for small samples and outliers | The Mann-Whitney U test is the '''nonparametric''' equivalent of the independent-samples t-test and is used when the conditions for a parametric procedure are not met. Non-parametric procedures are also known as "prerequisite-free procedures" because they have lower requirements for the distribution of the measured values in the population. For example, the data [[need]] not be normally distributed and the variables need only be ordinally scaled. A Mann-Whitney U test can also be calculated for small samples and outliers<ref>University of Zurich (2022)</ref>. | ||
== Assumptions for the Test== | ==Assumptions for the Test== | ||
Null Hypothesis: | Null Hypothesis: | ||
* The null hypothesis assumes that both groups under investigation are studied with the same population. | |||
* The two independent groups must be homogeneous and have the same distribution. | |||
If a 2-sided test occurs, the alternative hypothesis T1, which is tested against the null hypothesis, is that the first population is different from the second population. In this case, the null hypothesis is rejected. | If a 2-sided test occurs, the alternative '''hypothesis T1''', which is tested against the null hypothesis, is that the first population is different from the second population. In this case, the null hypothesis is rejected. | ||
Assumptions: | Assumptions for the test<ref>Robert, M., Allaire, D. (1988)</ref><ref>Nachar, N. (2008), pp. 14-15</ref><ref>University of Zurich (2022)</ref>: | ||
* The two groups studied must be drawn at random from the target population. The concept of randomness implies the absence of measurement and sampling error | * The two groups studied must be drawn at random from the [[target population]]. The concept of randomness implies the [[absence]] of measurement and [[sampling]] error. Note that error of the latter types may be included but must remain small. | ||
* Every Measurement must be for a differenct participant. | * Every Measurement must be for a differenct participant. | ||
* The scale for data measurement is ordinal or continuous type. The observation values are then of ordinal, relative, or absolute scale type | * The scale for data measurement is ordinal or continuous type. The observation values are then of ordinal, relative, or absolute scale type. | ||
*There is an independent variable by means of which the two groups to be compared are formed | * There is an independent variable by means of which the two groups to be compared are formed. | ||
== The Test | ==The Test== | ||
The Mann-Whitney U test first requires the calculation of a U statistic for each group. | The Mann-Whitney U test first requires the calculation of a U statistic for each group. The statisticts for each group have a knwon distribtution proposed by Mann and Whitney(1947)<ref>Mann, H., Whitney, D. (1947)</ref>. In mathematically terms, the Mann-Whitney U statistic is defined as follows<ref>Divine, G., Norton, H., Baron, A., & Juarez, E. (2017), p. 279</ref>: | ||
<math> U_x= n_x n_y + \frac{(n_x(n_x+ 1)}{2}) | <math> U_x= n_x n_y + \frac{(n_x(n_x+ 1)}{2}) - R_x</math> | ||
And therefore for 2 Groups: | And therefore for 2 Groups: | ||
<math> U_1= n_1 n_2 + \frac{(n_1(n_1+ 1)}{2}) | <math> U_1= n_1 n_2 + \frac{(n_1(n_1+ 1)}{2}) - R_1</math> | ||
And: | And: | ||
<math> U_2= n_1 n_2 + \frac{(n_2(n_2+ 1)}{2}) | <math> U_2= n_1 n_2 + \frac{(n_2(n_2+ 1)}{2}) - R_2</math> | ||
N<sub>x</sub> is defined as the number of observations/participants in the first group, n<sub>y</sub> the number of observations/participants in the second group, R<sub>x</sub> the sum of the ranks of the individual groups. After calculating the U statistic and determining an appropriate statistical threshold (α), the null hypothesis may or may not be rejected. | N<sub>x</sub> is defined as the number of observations/participants in the first group, n<sub>y</sub> the number of observations/participants in the second group, R<sub>x</sub> the sum of the ranks of the individual groups. After calculating the U statistic and determining an appropriate statistical threshold (α), the null hypothesis may or may not be rejected. | ||
H<sub>0</sub> is rejected if, according to Mann and Whitney's tables, the p corresponding to min (U<sub>x</sub>,U<sub>y</sub>) (the smallest of the two calculated U) is smaller than the p or the specified α-threshold. | H<sub>0</sub> is rejected if, according to Mann and Whitney's tables, the p corresponding to min (U<sub>x</sub>,U<sub>y</sub>) (the smallest of the two calculated U) is smaller than the p or the specified α-threshold. Technically, reject H<sub>0</sub> if p of min (U<sub>x</sub>,U<sub>y</sub>) <α threshold<ref>Nachar, N. (2008), pp. 14-17</ref>. | ||
== Example of U-Test== | ==Example of U-Test== | ||
20 patients of a hospital are examined. 12 of them are under cardiological treatment, while 8 are not. They all answer a questionnaire on general well-being (scores from 0 to 35, 0 representing very high, 35 very low well-being). The aim is to test whether there are differences in terms of central tendency of well-being between the cardiac patients and the other patients. The dataset to be analyzed contains, in addition to the subject number (ID), the grouping variable (Group), which takes the value 1 for cardiac patients and 2 for other patients, and the well-being value (Data). | 20 patients of a hospital are examined. 12 of them are under cardiological treatment, while 8 are not. They all answer a questionnaire on general well-being (scores from 0 to 35, 0 representing very high, 35 very low well-being). The aim is to test whether there are differences in terms of [[central tendency]] of well-being between the cardiac patients and the other patients. The dataset to be analyzed contains, in addition to the subject number (ID), the grouping variable (Group), which takes the value 1 for cardiac patients and 2 for other patients, and the well-being value (Data). | ||
The Mann-Whitney U test | The Mann-Whitney U test works by ranking the data. That is, it is not calculated with the measured values themselves, but these are replaced by ranks with which the actual test is performed. Thus, the calculation of the test is based exclusively on the ordering of the data (greater than, less than). The absolute distances between the values are not taken into account. | ||
In the first step, the measured values are ranked according to their magnitude (to be seen in the Well-being column). This ranking is independent of group membership. Subsequently, the measured values are ranked (starting from 1 and ascending), whereby a distinction is made here between the groups. If the same measured value occurs several times, the mean value is formed in so-called "linked ranks". Afterwards, rank sums are formed for the two groups by adding up the ranks within the groups. | In the first step, the measured values are ranked according to their magnitude (to be seen in the Well-being column). This ranking is independent of group membership. Subsequently, the measured values are ranked (starting from 1 and ascending), whereby a distinction is made here between the groups. If the same measured value occurs several times, the mean value is formed in so-called "linked ranks". Afterwards, rank sums are formed for the two groups by adding up the ranks within the groups. | ||
Line 47: | Line 47: | ||
| 5 || 1 || 0 || 1 || | | 5 || 1 || 0 || 1 || | ||
|- | |- | ||
| 6 || 2 || 1 || | | 6 || 2 || 1 || || 2 | ||
|- | |- | ||
| 14 || 2 || 2 || | | 14 || 2 || 2 || || 3 | ||
|- | |- | ||
| 9 || 2 || 3 || | | 9 || 2 || 3 || || 4 | ||
|- | |- | ||
| 18 || 2 || 4 || | | 18 || 2 || 4 || || 5 | ||
|- | |- | ||
| 10 || 1 || 5 || 6 || | | 10 || 1 || 5 || 6 || | ||
Line 59: | Line 59: | ||
| 19 || 1 || 5.5 || 7 || | | 19 || 1 || 5.5 || 7 || | ||
|- | |- | ||
| 1 || 2 || 6 || | | 1 || 2 || 6 || || 8 | ||
|- | |- | ||
| 8 || 2 || 6.5 || | | 8 || 2 || 6.5 || || 9 | ||
|- | |- | ||
| 17 || 1 || 7 || 10 || | | 17 || 1 || 7 || 10 || | ||
|- | |- | ||
| 15 || 2 || 7.5 || | | 15 || 2 || 7.5 || || 11 | ||
|- | |- | ||
| 11 || 1 || 8 || 12 || | | 11 || 1 || 8 || 12 || | ||
|- | |- | ||
| 3 || 2 || 8.5 || | | 3 || 2 || 8.5 || || 13 | ||
|- | |- | ||
| 2 || 1 || 9 || 14 || | | 2 || 1 || 9 || 14 || | ||
Line 85: | Line 85: | ||
| 13 || 1 || 33 || 20 || | | 13 || 1 || 33 || 20 || | ||
|- | |- | ||
! | Rangingsummary || | ! | Rangingsummary || || || 155 || 55 | ||
|} | |} | ||
For group 1 the rank sum is 155 (n=12), for group 2 55 (n=8). To calculate U, the larger of the two rank sums is used. | For group 1 the rank sum is 155 (n=12), for group 2 55 (n=8). To calculate U, the larger of the two rank sums is used. | ||
<math>U = n_1n_2+ \frac{n_1(n_1+1)}{2} -R_1</math> | <math>U = n_1n_2+ \frac{n_1(n_1+1)}{2} - R_1</math> | ||
n<sub>1</sub> = Sample size of the group with the larger rank sum. | n<sub>1</sub> = Sample size of the group with the larger rank sum. | ||
Line 97: | Line 97: | ||
Thus, it follows: | Thus, it follows: | ||
<math>U = 12*8+ \frac{12(12+1)}{2} -155 = 19</math> | <math>U = 12*8+ \frac{12(12+1)}{2} - 155 = 19</math> | ||
=== Significance === | ===Testing of Significance=== | ||
If the sample size is large enough (n<sub>1</sub>+n<sub>2</sub> > 30), significance can be tested. Here z is calculated: | If the sample size is large enough (n<sub>1</sub>+n<sub>2</sub> > 30), significance can be tested. Here z is calculated: | ||
<math>z = \frac{U- | <math>z = \frac{U-\mu_U}{\sigma_U}=\frac{U-\frac{n_1*n_2}{2}}{\sqrt{\frac{n_1*n_2(n_1+n_2+1)}{12}}}</math> | ||
μ= mean of the U-distribution (U-value, without difference between groups). | μ= mean of the U-distribution (U-value, without difference between groups). | ||
σ= Standard Error of the U-Value | σ= [[Standard]] Error of the U-Value | ||
n<sub>1</sub>= sample size of the group with the larger rank sum | n<sub>1</sub>= sample size of the group with the larger rank sum | ||
Line 112: | Line 112: | ||
n<sub>2</sub>= sample size of the group with the smaller rank sum | n<sub>2</sub>= sample size of the group with the smaller rank sum | ||
<math>z = \frac{U- | <math>z = \frac{U-\mu_U}{\sigma_U}=\frac{19-\frac{12*8}{2}}{\sqrt{\frac{12*8(12+8+1)}{12}}}</math> | ||
This z-value is now compared with the critical value of the standard normal distribution. For the two-sided significance level .05, it is ±1.96. If the magnitude of the test statistic is higher than the critical value, the difference between the two groups is significant <ref> | This z-value is now compared with the critical value of the standard [[normal distribution]]. For the two-sided significance level .05, it is ±1.96. If the magnitude of the test statistic is higher than the critical value, the difference between the two groups is significant <ref>University of Zurich (2022)</ref>. | ||
==Footnotes== | ==Footnotes== | ||
<references/> | <references/> | ||
{{infobox5|list1={{i5link|a=[[Confidence level]]}} — {{i5link|a=[[Adjusted mean]]}} — {{i5link|a=[[Interval scale]]}} — {{i5link|a=[[Nominal scale]]}} — {{i5link|a=[[Anderson darling normality test]]}} — {{i5link|a=[[Statistical significance]]}} — {{i5link|a=[[Two-way ANOVA]]}} — {{i5link|a=[[Monte carlo method]]}} — {{i5link|a=[[Random error]]}} }} | |||
==References== | ==References== | ||
* Divine, G., | * Divine, G., Norton, H., Baron, A., & Juarez, E. (2017). [https://www.tandfonline.com/doi/pdf/10.1080/00031305.2017.1305291?needAccess=true ''The Wilcoxon-Mann-Whitney Procedure Fails as a Test of Medians''], The American Statistician, 72. 10.1080/00031305.2017.1305291 | ||
* Mann, H., Whitney, D. (1947). [https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-18/issue-1/On-a-Test-of-Whether-one-of-Two-Random-Variables/10.1214/aoms/1177730491.full ''On a test of whether one of 2 random variables is stochastically larger than the other.''] Annals of Mathematical Statistics, 18, 50‐60 | * Mann, H., Whitney, D. (1947). [https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-18/issue-1/On-a-Test-of-Whether-one-of-Two-Random-Variables/10.1214/aoms/1177730491.full ''On a test of whether one of 2 random variables is stochastically larger than the other.''], Annals of Mathematical Statistics, 18, 50‐60 | ||
* Nachar, N. (2008). [https://pdfs.semanticscholar.org/007b/c0936646c34abd369ceda930000c3d142228.pdf ''The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution.''] Tutorials in Quantitative Methods for Psychology | * Nachar, N. (2008). [https://pdfs.semanticscholar.org/007b/c0936646c34abd369ceda930000c3d142228.pdf ''The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution.''], Tutorials in Quantitative Methods for Psychology, 13-17 | ||
* Robert, M., Allaire, D. (1988). [https://kolibris.univ-antilles.fr/discovery/fulldisplay/alma991000611799705746/33UAG_INST:33UAG_INST ''Fondements et étapes de la recherche scientifique en psychologie.''] Saint‐Hyacinthe : Edisem et Paris : Maloine. Sedlmeier, P., & Gigerenzer | * Robert, M., Allaire, D. (1988). [https://kolibris.univ-antilles.fr/discovery/fulldisplay/alma991000611799705746/33UAG_INST:33UAG_INST ''Fondements et étapes de la recherche scientifique en psychologie.''], Saint‐Hyacinthe : Edisem et Paris : Maloine. Sedlmeier, P., & Gigerenzer | ||
* | * [https://www.methodenberatung.uzh.ch/de/datenanalyse_spss/unterschiede/zentral/mann.html#1.2._Voraussetzungen_des_Mann-Whitney-U-Tests ''Mann-Whtiney-U-Test.''] (2022), University of Zurich | ||
{{a|Sven Korten}} | {{a|Sven Korten}} | ||
[[Category:Statistics]] | [[Category:Statistics]] |
Latest revision as of 08:50, 18 November 2023
The Mann-Whitney U test for independent samples tests whether the central tendencies of two independent samples are different. The Mann-Whitney U test is used when the requirements for a t-test for independent samples are not met. The question posed by the Mann-Whitney U test for independent samples is often abbreviated thus: "Do the central tendencies of two independent samples differ?"
Differences to the T-Test
The Mann-Whitney U test is the nonparametric equivalent of the independent-samples t-test and is used when the conditions for a parametric procedure are not met. Non-parametric procedures are also known as "prerequisite-free procedures" because they have lower requirements for the distribution of the measured values in the population. For example, the data need not be normally distributed and the variables need only be ordinally scaled. A Mann-Whitney U test can also be calculated for small samples and outliers[1].
Assumptions for the Test
Null Hypothesis:
- The null hypothesis assumes that both groups under investigation are studied with the same population.
- The two independent groups must be homogeneous and have the same distribution.
If a 2-sided test occurs, the alternative hypothesis T1, which is tested against the null hypothesis, is that the first population is different from the second population. In this case, the null hypothesis is rejected.
Assumptions for the test[2][3][4]:
- The two groups studied must be drawn at random from the target population. The concept of randomness implies the absence of measurement and sampling error. Note that error of the latter types may be included but must remain small.
- Every Measurement must be for a differenct participant.
- The scale for data measurement is ordinal or continuous type. The observation values are then of ordinal, relative, or absolute scale type.
- There is an independent variable by means of which the two groups to be compared are formed.
The Test
The Mann-Whitney U test first requires the calculation of a U statistic for each group. The statisticts for each group have a knwon distribtution proposed by Mann and Whitney(1947)[5]. In mathematically terms, the Mann-Whitney U statistic is defined as follows[6]:
And therefore for 2 Groups:
And:
Nx is defined as the number of observations/participants in the first group, ny the number of observations/participants in the second group, Rx the sum of the ranks of the individual groups. After calculating the U statistic and determining an appropriate statistical threshold (α), the null hypothesis may or may not be rejected.
H0 is rejected if, according to Mann and Whitney's tables, the p corresponding to min (Ux,Uy) (the smallest of the two calculated U) is smaller than the p or the specified α-threshold. Technically, reject H0 if p of min (Ux,Uy) <α threshold[7].
Example of U-Test
20 patients of a hospital are examined. 12 of them are under cardiological treatment, while 8 are not. They all answer a questionnaire on general well-being (scores from 0 to 35, 0 representing very high, 35 very low well-being). The aim is to test whether there are differences in terms of central tendency of well-being between the cardiac patients and the other patients. The dataset to be analyzed contains, in addition to the subject number (ID), the grouping variable (Group), which takes the value 1 for cardiac patients and 2 for other patients, and the well-being value (Data).
The Mann-Whitney U test works by ranking the data. That is, it is not calculated with the measured values themselves, but these are replaced by ranks with which the actual test is performed. Thus, the calculation of the test is based exclusively on the ordering of the data (greater than, less than). The absolute distances between the values are not taken into account.
In the first step, the measured values are ranked according to their magnitude (to be seen in the Well-being column). This ranking is independent of group membership. Subsequently, the measured values are ranked (starting from 1 and ascending), whereby a distinction is made here between the groups. If the same measured value occurs several times, the mean value is formed in so-called "linked ranks". Afterwards, rank sums are formed for the two groups by adding up the ranks within the groups.
ID | Group | General well-being | Ranking Group 1 | Ranking Group 2 |
---|---|---|---|---|
5 | 1 | 0 | 1 | |
6 | 2 | 1 | 2 | |
14 | 2 | 2 | 3 | |
9 | 2 | 3 | 4 | |
18 | 2 | 4 | 5 | |
10 | 1 | 5 | 6 | |
19 | 1 | 5.5 | 7 | |
1 | 2 | 6 | 8 | |
8 | 2 | 6.5 | 9 | |
17 | 1 | 7 | 10 | |
15 | 2 | 7.5 | 11 | |
11 | 1 | 8 | 12 | |
3 | 2 | 8.5 | 13 | |
2 | 1 | 9 | 14 | |
20 | 1 | 11 | 15 | |
12 | 1 | 13 | 16 | |
16 | 1 | 28 | 17 | |
4 | 1 | 29 | 18 | |
7 | 1 | 32 | 19 | |
13 | 1 | 33 | 20 | |
Rangingsummary | 155 | 55 |
For group 1 the rank sum is 155 (n=12), for group 2 55 (n=8). To calculate U, the larger of the two rank sums is used.
n1 = Sample size of the group with the larger rank sum. n2 = Sample size of the group with the larger rank sum. R1 = Larger rank sum Thus, it follows:
Testing of Significance
If the sample size is large enough (n1+n2 > 30), significance can be tested. Here z is calculated:
μ= mean of the U-distribution (U-value, without difference between groups).
σ= Standard Error of the U-Value
n1= sample size of the group with the larger rank sum
n2= sample size of the group with the smaller rank sum
This z-value is now compared with the critical value of the standard normal distribution. For the two-sided significance level .05, it is ±1.96. If the magnitude of the test statistic is higher than the critical value, the difference between the two groups is significant [8].
Footnotes
Mann-Whitney U test — recommended articles |
Confidence level — Adjusted mean — Interval scale — Nominal scale — Anderson darling normality test — Statistical significance — Two-way ANOVA — Monte carlo method — Random error |
References
- Divine, G., Norton, H., Baron, A., & Juarez, E. (2017). The Wilcoxon-Mann-Whitney Procedure Fails as a Test of Medians, The American Statistician, 72. 10.1080/00031305.2017.1305291
- Mann, H., Whitney, D. (1947). On a test of whether one of 2 random variables is stochastically larger than the other., Annals of Mathematical Statistics, 18, 50‐60
- Nachar, N. (2008). The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution., Tutorials in Quantitative Methods for Psychology, 13-17
- Robert, M., Allaire, D. (1988). Fondements et étapes de la recherche scientifique en psychologie., Saint‐Hyacinthe : Edisem et Paris : Maloine. Sedlmeier, P., & Gigerenzer
- Mann-Whtiney-U-Test. (2022), University of Zurich
Author: Sven Korten