Mann-Whitney U test

From CEOpedia | Management online

The Mann-Whitney U test for independent samples tests whether the central tendencies of two independent samples are different. The Mann-Whitney U test is used when the requirements for a t-test for independent samples are not met. The question posed by the Mann-Whitney U test for independent samples is often abbreviated thus: "Do the central tendencies of two independent samples differ?"

Differences to the T-Test

The Mann-Whitney U test is the nonparametric equivalent of the independent-samples t-test and is used when the conditions for a parametric procedure are not met. Non-parametric procedures are also known as "prerequisite-free procedures" because they have lower requirements for the distribution of the measured values in the population. For example, the data need not be normally distributed and the variables need only be ordinally scaled. A Mann-Whitney U test can also be calculated for small samples and outliers[1].

Assumptions for the Test

Null Hypothesis:

  • The null hypothesis assumes that both groups under investigation are studied with the same population.
  • The two independent groups must be homogeneous and have the same distribution.

If a 2-sided test occurs, the alternative hypothesis T1, which is tested against the null hypothesis, is that the first population is different from the second population. In this case, the null hypothesis is rejected.

Assumptions for the test[2][3][4]:

  • The two groups studied must be drawn at random from the target population. The concept of randomness implies the absence of measurement and sampling error. Note that error of the latter types may be included but must remain small.
  • Every Measurement must be for a differenct participant.
  • The scale for data measurement is ordinal or continuous type. The observation values are then of ordinal, relative, or absolute scale type.
  • There is an independent variable by means of which the two groups to be compared are formed.

The Test

The Mann-Whitney U test first requires the calculation of a U statistic for each group. The statisticts for each group have a knwon distribtution proposed by Mann and Whitney(1947)[5]. In mathematically terms, the Mann-Whitney U statistic is defined as follows[6]:

And therefore for 2 Groups:

And:

Nx is defined as the number of observations/participants in the first group, ny the number of observations/participants in the second group, Rx the sum of the ranks of the individual groups. After calculating the U statistic and determining an appropriate statistical threshold (α), the null hypothesis may or may not be rejected.

H0 is rejected if, according to Mann and Whitney's tables, the p corresponding to min (Ux,Uy) (the smallest of the two calculated U) is smaller than the p or the specified α-threshold. Technically, reject H0 if p of min (Ux,Uy) <α threshold[7].

Example of U-Test

20 patients of a hospital are examined. 12 of them are under cardiological treatment, while 8 are not. They all answer a questionnaire on general well-being (scores from 0 to 35, 0 representing very high, 35 very low well-being). The aim is to test whether there are differences in terms of central tendency of well-being between the cardiac patients and the other patients. The dataset to be analyzed contains, in addition to the subject number (ID), the grouping variable (Group), which takes the value 1 for cardiac patients and 2 for other patients, and the well-being value (Data).

The Mann-Whitney U test works by ranking the data. That is, it is not calculated with the measured values themselves, but these are replaced by ranks with which the actual test is performed. Thus, the calculation of the test is based exclusively on the ordering of the data (greater than, less than). The absolute distances between the values are not taken into account.

In the first step, the measured values are ranked according to their magnitude (to be seen in the Well-being column). This ranking is independent of group membership. Subsequently, the measured values are ranked (starting from 1 and ascending), whereby a distinction is made here between the groups. If the same measured value occurs several times, the mean value is formed in so-called "linked ranks". Afterwards, rank sums are formed for the two groups by adding up the ranks within the groups.

Example Data
ID Group General well-being Ranking Group 1 Ranking Group 2
5 1 0 1
6 2 1 2
14 2 2 3
9 2 3 4
18 2 4 5
10 1 5 6
19 1 5.5 7
1 2 6 8
8 2 6.5 9
17 1 7 10
15 2 7.5 11
11 1 8 12
3 2 8.5 13
2 1 9 14
20 1 11 15
12 1 13 16
16 1 28 17
4 1 29 18
7 1 32 19
13 1 33 20
Rangingsummary 155 55

For group 1 the rank sum is 155 (n=12), for group 2 55 (n=8). To calculate U, the larger of the two rank sums is used.

n1 = Sample size of the group with the larger rank sum. n2 = Sample size of the group with the larger rank sum. R1 = Larger rank sum Thus, it follows:

Testing of Significance

If the sample size is large enough (n1+n2 > 30), significance can be tested. Here z is calculated:

μ= mean of the U-distribution (U-value, without difference between groups).

σ= Standard Error of the U-Value

n1= sample size of the group with the larger rank sum

n2= sample size of the group with the smaller rank sum

This z-value is now compared with the critical value of the standard normal distribution. For the two-sided significance level .05, it is ±1.96. If the magnitude of the test statistic is higher than the critical value, the difference between the two groups is significant [8].

Footnotes

  1. University of Zurich (2022)
  2. Robert, M., Allaire, D. (1988)
  3. Nachar, N. (2008), pp. 14-15
  4. University of Zurich (2022)
  5. Mann, H., Whitney, D. (1947)
  6. Divine, G., Norton, H., Baron, A., & Juarez, E. (2017), p. 279
  7. Nachar, N. (2008), pp. 14-17
  8. University of Zurich (2022)


Mann-Whitney U testrecommended articles
Confidence levelAdjusted meanInterval scaleNominal scaleAnderson darling normality testStatistical significanceTwo-way ANOVAMonte carlo methodRandom error

References

Author: Sven Korten