# Box diagram

Box diagram
Box diagram

Box diagram (also box-and-whisker diagram, box plot) is a graphic method for depicting statistic data throughout quartiles (Lee 200, s. 106; Mosler 2006, s. 33). This kind of plot is useful in data analysis (Tukey 1977, s. 531). It is including information about the shape and dispersion of the empirical distribution. It is impossible to use it for nominal or grouped ordinal data (Hanneman 2012, s. 154).

Box diagram can be created by statistical programs as a specialized chart. This applies to data analysis - automatic generation of the box diagram provides information on the distribution of collected statistic data without the compulsion to calculate more indicators. The box diagram visualizes inter alia: quartiles, extreme values, and median (Day 2007, s. 437).

## Construction

Fig.1. Box diagram

Box diagram includes (Wickham 2011, s.2) :

• middle value called the median (Md) or second quartile (Q2):

$$Q_2=x_{Q_2}+(N_{Q_2}-N_{isk-1})\times{h_{Q_1}\over n_{Q_2}}$$

$$x_{0Q_2}$$ - the number of intervals containing the second quantile,

$$N_{Q_2}$$ - lower bound of the compartment containing the first quantile,

$$n_{isk-1}$$ - cumulative number of the interval preceding the cumulative number of the first quartile,

$$h_{Q_1}$$ - the span (width) of the compartment containing the first quartile,

$$n_{Q_2}$$ - position of the first quartile,

• quartiles - lower (Q1) and upper (Q3):

$$Q_1=x_{Q_1}+(N_{Q_1}-N_{isk-1})\times{h_{Q_1}\over n_{Q_1}}$$

$$Q_3=x_{Q_3}+(N_{Q_3}-N_{isk-1})\times{h_{Q_1}\over n_{Q_3}}$$

• extreme values - minimum (L) and maximum (H),
• two whiskers combining extreme values with the box,
• outside-the-range values called also outliners.
Fig.2. Box diagram - example

Boxes are horizontal or vertical rectangles, left side of the box is defined by the first quartile (Q1), right side, analogically, is defined by the third quartile (Q3). The second quartile(Q2) also called median (Md) is depicting as a vertical line inside the box - it is the middle value of the data set (Welkowitz 2006).

Whiskers are the vertical lines that can be created in two ways. The first way to designate whiskers is by creating vertical lines extending from the minimum value (L) to the box and from the box to the maximum value (H) (Bay-Wiliams 2004, s.90). The second way of designate whiskers is calculating a length of one and a half values ​​of the interquartile range (IQR). In order to calculate the interquartile range, from aggregated by increasing values, you should find the value between lower 25% and upper 25% of the data (Vaughan 2001, s. 35). Outside-the-range values are depicted by points (DeVor 2007, s. 83). It is represented by the following equation:

• $$maximum = Q3+1.5\times IQR$$,
• $$minimum = Q1-1.5\times IQR$$.

In case of having more than one box, the spaces between them are determined by dispersion degree and data skewness. Box diagrams are helpful in identifying outliners (McKenzie 2014, s.44).

## References

Author: Dominika Paś