Central tendency

From CEOpedia

Central tendency refers to a statistical measure identifying a central or typical value within a probability distribution. Measures of central tendency, sometimes called averages colloquially, attempt to describe datasets by identifying central positions within them[1]. As summary statistics, these measures condense information from many observations into single representative values.

Historical development

The concept of averaging observations has ancient roots. A form of arithmetic mean was used in early times, and the mode could be recognized in certain historical examples. The Greek historian Thucydides described a method resembling what statisticians now call the midrange.

The 19th century marked formalization of central tendency measures amid the rise of statistical science. Francis Galton, working during the 1880s, advocated the median over the arithmetic mean for skewed distributions[2]. He introduced the term "median" in 1881, having earlier used "middle-most value" in 1869 to describe a central point resistant to outliers in anthropometric data.

English mathematician Karl Pearson contributed significantly to formalizing these measures. In 1895, he stated: "I have found it convenient to use the term mode for the abscissa corresponding to the ordinate of maximum frequency"[3]. That same year, Pearson distinguished the mean, median, and mode in his paper on skew variation, establishing their roles in describing homogeneous material.

The term "central tendency" itself dates from the late 1920s. Carl Friedrich Gauss had earlier taken as axiomatic in his 1809 Theoria Motus that the arithmetic mean provides the most probable value of an unknown quantity. By mid-19th century, mathematicians including Pascal, Bernoulli, De Moivre, Simpson, Laplace, Gauss, and Quetelet had developed probability concepts and measures of central tendency alongside recognition of the normal distribution's wide applicability.

The arithmetic mean

The arithmetic mean, commonly called simply "the mean" or "the average," equals the sum of all values divided by the total number of values. It is the most commonly used measure of central tendency[4]. When people reference an "average" in everyday conversation, they typically mean the arithmetic mean.

Calculation proceeds straightforwardly: add all observations and divide by the count. For a dataset containing values 3, 5, 7, 8, and 12, the mean equals (3+5+7+8+12)/5 = 35/5 = 7.

The mean incorporates every value in its calculation. This property has advantages; it uses all available information. Changes to any observation affect the mean. This sensitivity makes the mean responsive to the full dataset.

However, this same property creates vulnerability to extreme values. A single very large or very small observation can dramatically shift the mean away from the typical experience of most observations. Outliers distort the mean, potentially making it unrepresentative of central tendency for skewed distributions.

The median

The median is the value occupying the middle position when all observations are arranged in ascending or descending order. It divides the frequency distribution exactly into two halves[5]. Fifty percent of observations fall at or below the median, making it equivalent to the 50th percentile.

Finding the median requires first sorting observations. For odd-numbered datasets, the median is the middle value. For a dataset of five values arranged as 3, 5, 7, 8, 12, the median is 7. For even-numbered datasets, the median typically equals the mean of the two middle values.

The median demonstrates resistance to outliers. Extreme values at either end do not shift the median significantly. This property makes the median preferred for skewed distributions where outliers would distort the mean. Income distributions exemplify this situation; high earners inflate the mean, but the median better represents typical experience.

Francis Galton preferred median values in his statistical analyses. He employed the median and the median absolute error where contemporary statisticians would use the mean and root mean squared error. He justified this preference by analogy with simple majority voting procedures.

The mode

The mode is defined as the value occurring most frequently in a dataset. Some datasets lack a mode because each value occurs only once. Others may have multiple modes. Bimodal distributions have two values appearing with equal highest frequency.

Karl Pearson formalized the mode concept in 1895. He defined it as the value corresponding to the maximum frequency in a distribution. This measure captures what is most common rather than what is arithmetically central.

The mode uniquely can apply to nominal (categorical) data where arithmetic operations are meaningless. For a survey asking favorite colors, calculating a mean makes no sense, but identifying the most frequently chosen color provides useful central tendency information.

Modes may not uniquely identify distributions. Two very different datasets might share the same mode while differing substantially in other characteristics. This limitation reduces the mode's usefulness as a standalone summary measure.

Comparison and selection

In symmetrical distributions, the mean, median, and mode coincide. A perfect normal distribution has all three measures at the same point. Symmetry makes measure selection less consequential.

Skewed distributions cause these measures to diverge. In right-skewed distributions, the mean exceeds the median, which exceeds the mode. Left-skewed distributions reverse this ordering. The degree of separation indicates skewness severity.

Selection among measures depends on data characteristics and analytical purposes. The mean is preferred when data are approximately symmetrical without significant outliers. Its mathematical properties, including connections to variance and standard deviation, support further statistical analysis.

The median suits skewed data or situations where outliers are present but meaningful. Reported median home prices, for instance, better represent typical transactions than means inflated by luxury properties. The median also applies to ordinal data where values can be ranked but arithmetic operations are inappropriate.

The mode serves categorical data analysis and identifies the most common category. It also helps characterize multimodal distributions where multiple peaks indicate distinct subgroups within data.

Applications in management and business

Central tendency measures find extensive application in organizational settings. Performance metrics often report means for efficiency indicators. Quality control uses means to track process centering.

Compensation analysis frequently employs medians. Salary surveys report median wages because outlier executives would distort mean figures. Understanding where positions fall relative to market medians supports competitive compensation decisions.

Customer analytics may identify modal preferences for product design. Market segmentation relies on understanding what is most typical within customer groups. Sales forecasting uses averages to project future demand.

Financial reporting incorporates various averages. Average inventory, average receivables, and other metrics inform management decision-making. Understanding which central tendency measure underlies reported figures aids proper interpretation.

{{{Concept}}} Primary topic {{{list1}}} Related topics {{{list2}}} Methods and techniques {{{list3}}}

References

  • Galton, F. (1881). Report of the Anthropometric Committee. Report of the British Association for the Advancement of Science, 51, 225-272.
  • Pearson, K. (1895). Contributions to the mathematical theory of evolution: II. Skew variation in homogeneous material. Philosophical Transactions of the Royal Society A, 186, 343-414.
  • Weisberg, H.F. (1992). Central Tendency and Variability. Sage Publications.
  • Stigler, S.M. (1986). The History of Statistics: The Measurement of Uncertainty before 1900. Harvard University Press.

Footnotes

  1. Central tendency measures identify central or typical values within datasets and serve as summary statistics.
  2. Francis Galton advocated the median for skewed distributions and introduced the term "median" in 1881.
  3. Karl Pearson formally defined the mode in 1895 as the value corresponding to maximum frequency.
  4. The arithmetic mean is the most commonly used measure of central tendency, calculated by summing values and dividing by count.
  5. The median divides distributions into two equal halves and resists distortion from outliers.

Author: Slawomir Wawak