Mann-Whitney U test
The Mann-Whitney U test (also known as the "Wilcoxon rank-sum test" (WRS)) for independent samples tests whether the central tendencies of two independent samples are different. The Mann-Whitney U test is used when the requirements for a t-test for independent samples are not met. The question posed by the Mann-Whitney U test for independent samples is often abbreviated thus: "Do the central tendencies of two independent samples differ?"
Differences to the T-Test
The Mann-Whitney U test is the nonparametric equivalent of the independent-samples t-test and is used when the conditions for a parametric procedure are not met. Non-parametric procedures are also known as "prerequisite-free procedures" because they have lower requirements for the distribution of the measured values in the population. For example, the data need not be normally distributed and the variables need only be ordinally scaled. A Mann-Whitney U test can also be calculated for small samples and outliers.[1]
Assumptions
- Null Hypothesis:
- The null hypothesis assumes that both groups under investigation are studied with the same population.
- The two independent groups must be homogeneous and have the same distribution.
If a 2-sided test occurs, the alternative hypothesis T1, which is tested against the null hypothesis, is that the first population is different from the second population. In this case, the null hypothesis is rejected.
Assumptions:
- The two groups studied must be drawn at random from the target population. The concept of randomness implies the absence of measurement and sampling error [2]. Note that error of the latter types may be included but must remain small.
- Every Measurement must be for a differenct participant.
- The scale for data measurement is ordinal or continuous type. The observation values are then of ordinal, relative, or absolute scale type [3]
- There is an independent variable by means of which the two groups to be compared are formed [4].
The Test
The Mann-Whitney U test first requires the calculation of a U statistic for each group. These statistics have a known distribution under the null hypothesis established by Mann and Whitney (1947)[5]. Mathematically, the Mann-Whitney U statistic is defined as follows [6]:
Failed to parse (syntax error): {\displaystyle U_x= n_x n_y + \frac{(n_x(n_x+ 1)}{2}) − R_x}
And therefore for 2 Groups:
Failed to parse (syntax error): {\displaystyle U_1= n_1 n_2 + \frac{(n_1(n_1+ 1)}{2}) − R_1}
And:
Failed to parse (syntax error): {\displaystyle U_2= n_1 n_2 + \frac{(n_2(n_2+ 1)}{2}) − R_2}
Nx is defined as the number of observations/participants in the first group, ny the number of observations/participants in the second group, Rx the sum of the ranks of the individual groups. After calculating the U statistic and determining an appropriate statistical threshold (α), the null hypothesis may or may not be rejected.
H0 is rejected if, according to Mann and Whitney's tables, the p corresponding to min (Ux,Uy) (the smallest of the two calculated U) is smaller than the p or the specified α-threshold. In technical terms, reject H0 if p of min (Ux,Uy) <α threshold[7].
Example
20 patients of a hospital are examined. 12 of them are under cardiological treatment, while 8 are not. They all answer a questionnaire on general well-being (scores from 0 to 35, 0 representing very high, 35 very low well-being). The aim is to test whether there are differences in terms of central tendency of well-being between the cardiac patients and the other patients. The dataset to be analyzed contains, in addition to the subject number (ID), the grouping variable (Group), which takes the value 1 for cardiac patients and 2 for other patients, and the well-being value (Data).
The Mann-Whitney U test is based on the idea of ranking the data. That is, it is not calculated with the measured values themselves, but these are replaced by ranks with which the actual test is performed. Thus, the calculation of the test is based exclusively on the ordering of the data (greater than, less than). The absolute distances between the values are not taken into account.
In the first step, the measured values are ranked according to their magnitude (to be seen in the Well-being column). This ranking is independent of group membership. Subsequently, the measured values are ranked (starting from 1 and ascending), whereby a distinction is made here between the groups. If the same measured value occurs several times, the mean value is formed in so-called "linked ranks". Afterwards, rank sums are formed for the two groups by adding up the ranks within the groups.
ID | Group | General well-being | Ranking Group 1 | Ranking Group 2 |
---|---|---|---|---|
5 | 1 | 0 | 1 | |
6 | 2 | 1 | 2 | |
14 | 2 | 2 | 3 | |
9 | 2 | 3 | 4 | |
18 | 2 | 4 | 5 | |
10 | 1 | 5 | 6 | |
19 | 1 | 5.5 | 7 | |
1 | 2 | 6 | 8 | |
8 | 2 | 6.5 | 9 | |
17 | 1 | 7 | 10 | |
15 | 2 | 7.5 | 11 | |
11 | 1 | 8 | 12 | |
3 | 2 | 8.5 | 13 | |
2 | 1 | 9 | 14 | |
20 | 1 | 11 | 15 | |
12 | 1 | 13 | 16 | |
16 | 1 | 28 | 17 | |
4 | 1 | 29 | 18 | |
7 | 1 | 32 | 19 | |
13 | 1 | 33 | 20 | |
Rangingsummary | 155 | 55 |
For group 1 the rank sum is 155 (n=12), for group 2 55 (n=8). To calculate U, the larger of the two rank sums is used.
n1 = Sample size of the group with the larger rank sum. n2 = Sample size of the group with the larger rank sum. R1 = Larger rank sum Thus, it follows:
Significance
If the sample size is large enough (n1+n2 > 30), significance can be tested. Here z is calculated:
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle z = \frac{U-μ_U}{σ_U}=\frac{U-\frac{n_1*n_2}{2}}{\sqrt{\frac{n_1*n_2(n_1+n_2+1)}{12}}}}
μ= mean of the U-distribution (U-value, without difference between groups).
σ= Standard Error of the U-Value
n1= sample size of the group with the larger rank sum
n2= sample size of the group with the smaller rank sum
Failed to parse (syntax error): {\displaystyle z = \frac{U-μ_U}{σ_U}=\frac{19-\frac{12*8}{2}}{\sqrt{\frac{12*8(12+8+1)}{12}}}}
This z-value is now compared with the critical value of the standard normal distribution. For the two-sided significance level .05, it is ±1.96. If the magnitude of the test statistic is higher than the critical value, the difference between the two groups is significant [8].
References
- Divine, G., Norton, H., Baron, A., & Juarez, E. (2017). The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians. The American Statistician. 72. 10.1080/00031305.2017.1305291.
- Mann, H., Whitney, D. (1947). On a test of whether one of 2 random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50‐60.
- Nachar, N. (2008). The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution. Tutorials in Quantitative Methods for Psychology. 13-17
- Robert, M., Allaire, D. (1988). Fondements et étapes de la recherche scientifique en psychologie. Saint‐Hyacinthe : Edisem et Paris : Maloine. Sedlmeier, P., & Gigerenzer.
- UZH (2022). Mann-Whtiney-U-Test. University Zurich.
Author: Sven Korten
- ↑ UZH (2022). Mann-Whtiney-U-Test.
- ↑ Robert, M., Allaire, D. (1988). Fondements et étapes de la recherche scientifique en psychologie.
- ↑ Nachar, N. (2008). The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution.
- ↑ UZH (2022). Mann-Whtiney-U-Test.
- ↑ Mann, H., Whitney, D. (1947). On a test of whether one of 2 random variables is stochastically larger than the other.
- ↑ Divine, G., Norton, H., Baron, A., & Juarez, E. (2017). The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians.
- ↑ Nachar, N. (2008). The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution.
- ↑ UZH (2022). Mann-Whtiney-U-Test.