Five Point Summary is useful to get a high level insight of the data.
- Percentile: ‘p’th percentile of a data set is the value below which ‘p’ percent of the data are at or below. 25th percentile is referred to as the 1st quartile, 50th percentile as the median (or) 2nd quartile, 75th percentile as the 3rd quartile.
- Hinges: The median of the lower half of the data up to and including the median of the full data is the Lower Hinge. The median of the upper half of the data up to and including the median of the full data is the Upper Hinge. It is to be noted that when the number of elements in the data set is even, the median value would not be included as it is not part of the original data set.
- Quantile: ‘q’th quantile of a data set is the value below which ‘q’th fraction of the data are at or below.
There are two functions in R for this. They are fivenum() and summary()
fivenum(): Returns Tukey’s five number summary (minimum, lower-hinge, median, upper-hinge, maximum) for the input data.
summary(): Returns minimum, 1st quartile, median, mean, 3rd quartile, maximum for the input data.
> myarr <- array(1:100, dim=c(100,1))
> summary(myarr)
Min. : 1.00
1st Qu.: 25.75
Median : 50.50
Mean : 50.50
3rd Qu.: 75.25
Max. :100.00
> fivenum(myarr)
[1] 1.0 25.5 50.5 75.5 100.0
> myarr <- array(0:100, dim=c(101,1))
> summary(myarr)
Min. : 0
1st Qu.: 25
Median : 50
Mean : 50
3rd Qu.: 75
Max. :100
> fivenum(myarr)
[1] 0 25 50 75 100