Descriptive statistics

From CEOpedia | Management online

Descriptive statistic is a specific mean of describing, show, and summarize observations or raw data, that focuses on the description using illustrations, as for example charts and graphs (Jonker, 2010).

Simply said, descriptive statistics are analysis that condense, characterize, and permit the presentation of the data in a manner that makes them simpler to comprehend. By offering succinct insights and conclusions about the data, which can help discover trends, they assist us in understanding and describing the elements of a particular set of information (Conner et al., 2017).

These summaries could be quantitative, like summary statistics, or more graphic, like easily readable charts. These summaries might be used as the foundation for the preliminary description of the information as part of an analysis focused more on the statistics, or they could be good enough on their own for research that aims to gather information on a specific topic. The scientific process of gathering, organizing, analyzing, and interpreting data for the goal of description and decision-making is the subject of statistics (Kaushik et al., 2014).

Kaushik et al. identify some example in their article, which are: "Hang Seng Index, Life or car insurance rate, Unemployment rate, Consumer Price Index, etc." (2014, p. 1188). Moreover, two categories of statistical methodology exist (Kaushik et al., 2014):

  • Descriptive Statistics - this methodology uses graphs and charts to visually represent the data, and then explain them with the help of the numbers.
  • Inferential Statistics - this methodology uses techniques that produce results about the whole set of data (population), taking into account the insights gathered from the subset of the population (subset).

Graphs, charts, and quantitative data are frequently used in the summary. Descriptive statistics are occasionally the only analysis carried out in a scientific study or proof research, however they rarely assist us in drawing conclusions about the before stipulated assumptions. Rather, they are utilized as preliminary data, that can also serve as the basis for additional investigation by outlining initial issues or highlighting critical analysis in more extensive studies (Conner et al., 2017).

Measures of Central Tendency

In descriptive statistics, the most commonly used type of measures are the measures of central tendency. This group of metrics are utilized in the majority of type of studies (mathematical, evidence-based, quality, etc.) and they include: mean, median and mode. What these metrics define is the data set's center section frequency distribution (Conner et al., 2017).

More details on the main metrics used in statistics (Mishra et al., 2017):

  • Mean - also simply called average, is a data set's mathematical average value. The mean is calculated summing up all the observation of data set and diving the result by the total number of the observation. Its value is unique for one data set, meaning that the mean is only one number. This fact is useful during the comparison of two group of observations.
  • Median - the observation that is the most in the middle of the data set, both in the case of decreasing or increasing arrangement of the data. To put it another way, it is one of the observations that stands in the middle of the distribution. Also in this case the median is only one for any group of data, which makes it useful when comparing two different set of observations.
  • Mode - a value in the group of observation which appears the most time. In a single data set, there can be many modes, which therefore makes this metric not ideal when comparing two different groups of observations.

Measures of Dispersion

Measures of dispersion are part of another group of metrics that demonstrate how strong the variation (spread of the data in a data set) is in a group of observations. More precisely, it shows an absence of central tendency measures, which are often represented by the median or mean. These metrics aid us to determine if the data are homogeneous or heterogeneous (Mishra et al., 2017).

Common measures part of this group are (Conner et al., 2017):

  • Standard deviation - a measurement of the data's degree of dispersion with respect to the mean. It is calculated by summing up all the numbers of a data set, then taking the square root of the total minus the mean (also squared), finally diving the result by the number of values minus one.
  • Variance - this measure quantifies the distance of each number in the data set from the mean (average). If variance and standard deviation are high, then the spread of the points in the data set will be large as well; this values also in the opposite case (variance and standard deviation low - low spread).
  • Quartile - this value consists of three different points: lower q1 (first quartile), median q2 (second quartile) and upper q3 (third quartile). These three different points divide in four parts the data set, each one having the same amount of observations.

Graphically Displaying Different Types Of Data

Different sorts of data can be visually shown or illustrated in a variety of ways. Even though there is frequently flexibility in the format selection, the most straightforward and understandable format is eventually picked (Vetter, 2017). As Vetter reports in his article, "Common examples of graphics include a stem-and-leaf plot, histogram, bar chart, pie graph, line graph, scatter plot, and box-and-whisker plot" (2017, p. 4).


Descriptive statisticsrecommended articles
Control chartHistogramParametric analysisStatistical significanceBox diagramAdjusted meanAttribute control chartP chartCUSUM chart

References

Author: Claudio Mameli