Introduction to fdth

Overview

fdth builds frequency distribution tables (fdt) and their associated graphics from vectors, data frames, and matrices for both numerical and categorical variables.

Core functions:

Function	Purpose
`fdt()`	Frequency table for numerical data
`fdt_cat()`	Frequency table for categorical data
`make.fdt()`	Reconstruct a table from frequencies alone
`make.fdt_cat()`	Reconstruct a categorical table from frequencies
`mfv()`	Most frequent value (mode)
`sd()` / `var()`	Standard deviation / variance for grouped data

library(fdth)
#> 
#> Attaching package: 'fdth'
#> The following objects are masked from 'package:stats':
#> 
#>     sd, var

1. Numerical data — `fdt()`

1.1 Basic usage

set.seed(42)
x <- rnorm(200,
           mean = 10,
           sd = 2)

ft <- fdt(x)
ft
#>     Class limits  f   rf rf(%)  cf cf(%)
#>  [3.9737,5.2608)  4 0.02   2.0   4   2.0
#>  [5.2608,6.5479)  4 0.02   2.0   8   4.0
#>  [6.5479,7.8351) 20 0.10  10.0  28  14.0
#>  [7.8351,9.1222) 36 0.18  18.0  64  32.0
#>  [9.1222,10.409) 57 0.28  28.5 121  60.5
#>  [10.409,11.696) 44 0.22  22.0 165  82.5
#>  [11.696,12.984) 25 0.12  12.5 190  95.0
#>  [12.984,14.271)  8 0.04   4.0 198  99.0
#>  [14.271,15.558)  2 0.01   1.0 200 100.0

The default table has six columns:

Column	Description
`Class limits`	Interval notation
`f`	Absolute frequency
`rf`	Relative frequency
`rf(%)`	Relative frequency (%)
`cf`	Cumulative frequency
`cf(%)`	Cumulative frequency (%)

1.2 Choosing the number of classes

# Sturges (default)
fdt(x, breaks = "Sturges")

# Scott
fdt(x, breaks = "Scott")

# Freedman-Diaconis
fdt(x, breaks = "FD")

# Fixed number of classes
fdt(x, k = 8)

1.3 Custom interval boundaries

# Fixed start, end and width
ft2 <- fdt(x,
           start = 4,
           end   = 16,
           h     = 2)
ft2
#>  Class limits  f   rf rf(%)  cf cf(%)
#>         [4,6)  5 0.03   2.5   5   2.5
#>         [6,8) 27 0.14  13.5  32  16.0
#>        [8,10) 71 0.36  35.5 103  51.5
#>       [10,12) 66 0.33  33.0 169  84.5
#>       [12,14) 28 0.14  14.0 197  98.5
#>       [14,16)  3 0.01   1.5 200 100.0

1.4 Formatting class limits

Use format.classes = TRUE together with pattern to control the number of decimal places displayed in the class limits:

# Two decimal places
print(ft,
      format.classes = TRUE,
      pattern        = "%.2f")
#>    Class limits  f   rf rf(%)  cf cf(%)
#>    [3.97, 5.26)  4 0.02   2.0   4   2.0
#>    [5.26, 6.55)  4 0.02   2.0   8   4.0
#>    [6.55, 7.84) 20 0.10  10.0  28  14.0
#>    [7.84, 9.12) 36 0.18  18.0  64  32.0
#>   [9.12, 10.41) 57 0.28  28.5 121  60.5
#>  [10.41, 11.70) 44 0.22  22.0 165  82.5
#>  [11.70, 12.98) 25 0.12  12.5 190  95.0
#>  [12.98, 14.27)  8 0.04   4.0 198  99.0
#>  [14.27, 15.56)  2 0.01   1.0 200 100.0

# Summary with the same formatting
summary(ft,
        format.classes = TRUE,
        pattern        = "%.2f")
#>    Class limits  f   rf rf(%)  cf cf(%)
#>    [3.97, 5.26)  4 0.02   2.0   4   2.0
#>    [5.26, 6.55)  4 0.02   2.0   8   4.0
#>    [6.55, 7.84) 20 0.10  10.0  28  14.0
#>    [7.84, 9.12) 36 0.18  18.0  64  32.0
#>   [9.12, 10.41) 57 0.28  28.5 121  60.5
#>  [10.41, 11.70) 44 0.22  22.0 165  82.5
#>  [11.70, 12.98) 25 0.12  12.5 190  95.0
#>  [12.98, 14.27)  8 0.04   4.0 198  99.0
#>  [14.27, 15.56)  2 0.01   1.0 200 100.0

1.5 Right-closed intervals

By default intervals are left-closed [a, b). Use right = TRUE for right-closed (a, b]:

fdt(x, right = TRUE)

1.6 Missing values

x_na <- c(x, 
          NA, 
          NA)

# This errors by design:
tryCatch(fdt(x_na), error = function(e) message("Error: ", e$message))
#> Error: The data has <NA> values and na.rm=FALSE by default.

# Remove NAs explicitly:
fdt(x_na, na.rm = TRUE)

2. Plots — `plot.fdt.default()`

All plot types are selected with the type argument.

2.1 Absolute frequency histogram and polygon

plot(ft, 
     type = "fh", 
     main = "Frequency histogram")
plot(ft, 
     type = "fp", 
     main = "Frequency polygon")

2.2 Relative frequency (proportion and percentage)

plot(ft,
     type = "rfh",
     main = "Relative frequency histogram")
plot(ft,
     type = "rfph",
     main = "Relative frequency (%) histogram")

2.3 Density

plot(ft,
     type = "d",
     main = "Density histogram")

2.4 Cumulative frequency

plot(ft,
     type = "cfp",
     main = "Cumulative frequency polygon")
plot(ft,
     type = "cfpp",
     main = "Cumulative frequency (%) polygon")

2.5 Value labels on bars

plot(ft,
     type    = "fh",
     v       = TRUE,
     v.round = 0,
     main    = "Histogram with counts")

3. Summary statistics from grouped data

Once an fdt object exists, the usual statistics can be computed directly from the grouped (tabulated) data — no access to the original vector is needed.

ft3 <- fdt(x)

mean(ft3)
#> [1] 9.907335
median(ft3)
#> [1] 9.935109
mfv(ft3)          # mode(s)
#> [1] 9.917177
var(ft3)
#> [1] 3.842699
sd(ft3)
#> [1] 1.96028

# Quartiles (default)
quantile(ft3)
#>      25% 
#> 8.621638

# Deciles
quantile(ft3,
         i = 1:9,
         probs = seq(0,
                     1,
                     0.1))
#>       10%       20%       30%       40%       50%       60%       70%       80% 
#>  7.320210  8.264103  8.979173  9.483486  9.935109 10.386733 10.965119 11.550176 
#>       90% 
#> 12.468716

4. Multiple numerical variables — `fdt.data.frame()`

When the input is a data frame or matrix, fdt() builds one table per numeric column and returns an fdt.multiple object.

4.1 All numeric columns

ft_iris <- fdt(iris[, 1:4])
ft_iris
#> Sepal.Length 
#>   Class limits  f   rf rf(%)  cf  cf(%)
#>  [4.257,4.671)  9 0.06  6.00   9   6.00
#>  [4.671,5.084) 23 0.15 15.33  32  21.33
#>  [5.084,5.498) 20 0.13 13.33  52  34.67
#>  [5.498,5.911) 31 0.21 20.67  83  55.33
#>  [5.911,6.325) 25 0.17 16.67 108  72.00
#>  [6.325,6.738) 22 0.15 14.67 130  86.67
#>  [6.738,7.152)  9 0.06  6.00 139  92.67
#>  [7.152,7.565)  5 0.03  3.33 144  96.00
#>  [7.565,7.979)  6 0.04  4.00 150 100.00
#> 
#> Sepal.Width 
#>   Class limits  f   rf rf(%)  cf  cf(%)
#>   [1.98,2.254)  4 0.03  2.67   4   2.67
#>  [2.254,2.528) 15 0.10 10.00  19  12.67
#>  [2.528,2.801) 28 0.19 18.67  47  31.33
#>  [2.801,3.075) 36 0.24 24.00  83  55.33
#>  [3.075,3.349) 30 0.20 20.00 113  75.33
#>  [3.349,3.623) 22 0.15 14.67 135  90.00
#>  [3.623,3.896)  9 0.06  6.00 144  96.00
#>   [3.896,4.17)  4 0.03  2.67 148  98.67
#>   [4.17,4.444)  2 0.01  1.33 150 100.00
#> 
#> Petal.Length 
#>   Class limits  f   rf rf(%)  cf  cf(%)
#>   [0.99,1.654) 44 0.29 29.33  44  29.33
#>  [1.654,2.319)  6 0.04  4.00  50  33.33
#>  [2.319,2.983)  0 0.00  0.00  50  33.33
#>  [2.983,3.647)  6 0.04  4.00  56  37.33
#>  [3.647,4.312) 19 0.13 12.67  75  50.00
#>  [4.312,4.976) 29 0.19 19.33 104  69.33
#>   [4.976,5.64) 27 0.18 18.00 131  87.33
#>   [5.64,6.305) 14 0.09  9.33 145  96.67
#>  [6.305,6.969)  5 0.03  3.33 150 100.00
#> 
#> Petal.Width 
#>     Class limits  f   rf rf(%)  cf  cf(%)
#>   [0.099,0.3686) 41 0.27 27.33  41  27.33
#>  [0.3686,0.6381)  9 0.06  6.00  50  33.33
#>  [0.6381,0.9077)  0 0.00  0.00  50  33.33
#>   [0.9077,1.177) 10 0.07  6.67  60  40.00
#>    [1.177,1.447) 26 0.17 17.33  86  57.33
#>    [1.447,1.716) 18 0.12 12.00 104  69.33
#>    [1.716,1.986) 17 0.11 11.33 121  80.67
#>    [1.986,2.255) 15 0.10 10.00 136  90.67
#>    [2.255,2.525) 14 0.09  9.33 150 100.00

4.2 Grouped by a factor

Use the by argument to stratify each numeric variable by a categorical column:

ft_by <- fdt(iris[, c(1, 2, 5)],
             k  = 5,
             by = "Species")
ft_by
#> setosa.Sepal.Length 
#>   Class limits  f   rf rf(%) cf cf(%)
#>  [4.257,4.577)  5 0.10    10  5    10
#>  [4.577,4.897) 11 0.22    22 16    32
#>  [4.897,5.218) 23 0.46    46 39    78
#>  [5.218,5.538)  8 0.16    16 47    94
#>  [5.538,5.858)  3 0.06     6 50   100
#> 
#> setosa.Sepal.Width 
#>   Class limits  f   rf rf(%) cf cf(%)
#>   [2.277,2.71)  1 0.02     2  1     2
#>   [2.71,3.144) 11 0.22    22 12    24
#>  [3.144,3.577) 22 0.44    44 34    68
#>  [3.577,4.011) 13 0.26    26 47    94
#>  [4.011,4.444)  3 0.06     6 50   100
#> 
#> versicolor.Sepal.Length 
#>   Class limits  f   rf rf(%) cf cf(%)
#>  [4.851,5.295)  5 0.10    10  5    10
#>  [5.295,5.739) 16 0.32    32 21    42
#>  [5.739,6.182) 13 0.26    26 34    68
#>  [6.182,6.626) 10 0.20    20 44    88
#>   [6.626,7.07)  6 0.12    12 50   100
#> 
#> versicolor.Sepal.Width 
#>   Class limits  f   rf rf(%) cf cf(%)
#>   [1.98,2.271)  3 0.06     6  3     6
#>  [2.271,2.562) 10 0.20    20 13    26
#>  [2.562,2.852) 14 0.28    28 27    54
#>  [2.852,3.143) 18 0.36    36 45    90
#>  [3.143,3.434)  5 0.10    10 50   100
#> 
#> virginica.Sepal.Length 
#>   Class limits  f   rf rf(%) cf cf(%)
#>  [4.851,5.477)  1 0.02     2  1     2
#>  [5.477,6.102) 10 0.20    20 11    22
#>  [6.102,6.728) 22 0.44    44 33    66
#>  [6.728,7.353) 10 0.20    20 43    86
#>  [7.353,7.979)  7 0.14    14 50   100
#> 
#> virginica.Sepal.Width 
#>   Class limits  f   rf rf(%) cf cf(%)
#>   [2.178,2.51)  5 0.10    10  5    10
#>   [2.51,2.842) 14 0.28    28 19    38
#>  [2.842,3.174) 18 0.36    36 37    74
#>  [3.174,3.506) 10 0.20    20 47    94
#>  [3.506,3.838)  3 0.06     6 50   100

4.3 Plotting multiple tables

plot(ft_iris, type = "fh")

4.4 Statistics on multiple tables

mean(ft_iris)
#> $Sepal.Length
#> [1] 5.850567
#> 
#> $Sepal.Width
#> [1] 3.042258
#> 
#> $Petal.Length
#> [1] 3.735911
#> 
#> $Petal.Width
#> [1] 1.225742

5. Categorical data — `fdt_cat()`

5.1 Basic usage

set.seed(7)
fruits <- sample(c("apple", 
                   "banana", 
                   "cherry",
                   "strawberry",
                   "melon"),
                 size = 150,
                 replace = TRUE)

ft_cat <- fdt_cat(fruits)
ft_cat
#>    Category  f   rf rf(%)  cf  cf(%)
#>      banana 40 0.27 26.67  40  26.67
#>  strawberry 32 0.21 21.33  72  48.00
#>      cherry 28 0.19 18.67 100  66.67
#>       apple 25 0.17 16.67 125  83.33
#>       melon 25 0.17 16.67 150 100.00

By default the table is sorted by descending frequency.

5.2 Preserving natural order

fdt_cat(fruits, sort = FALSE)

5.3 Formatting

print(ft_cat, round = 3)
#>    Category  f    rf  rf(%)  cf   cf(%)
#>      banana 40 0.267 26.667  40  26.667
#>  strawberry 32 0.213 21.333  72  48.000
#>      cherry 28 0.187 18.667 100  66.667
#>       apple 25 0.167 16.667 125  83.333
#>       melon 25 0.167 16.667 150 100.000

5.4 Plots for categorical data

plot(ft_cat,
     type = "fb",
     main = "Frequency bar chart")

plot(ft_cat,
     type = "fd",
     main = "Frequency dotchart")

plot(ft_cat,
     type = "pa",
     main = "Pareto chart")

6. Reconstructing a table from frequencies

If the original data is no longer available but the frequency table is known, make.fdt() and make.fdt_cat() rebuild complete fdt objects.

# Numerical
ft_ref <- fdt(x)

ft_new <- make.fdt(f     = ft_ref$table$f,
                   start = ft_ref$breaks["start"],
                   end   = ft_ref$breaks["end"])

print(ft_new,
      format.classes = TRUE,
      pattern = "%.2f")
#>    Class limits  f   rf rf(%)  cf cf(%)
#>    [3.97, 5.26)  4 0.02   2.0   4   2.0
#>    [5.26, 6.55)  4 0.02   2.0   8   4.0
#>    [6.55, 7.84) 20 0.10  10.0  28  14.0
#>    [7.84, 9.12) 36 0.18  18.0  64  32.0
#>   [9.12, 10.41) 57 0.28  28.5 121  60.5
#>  [10.41, 11.70) 44 0.22  22.0 165  82.5
#>  [11.70, 12.98) 25 0.12  12.5 190  95.0
#>  [12.98, 14.27)  8 0.04   4.0 198  99.0
#>  [14.27, 15.56)  2 0.01   1.0 200 100.0

# Categorical
ft_new_cat <- make.fdt_cat(f = ft_cat$f,
                           categories = ft_cat$Category)
ft_new_cat
#>    Category  f   rf rf(%)  cf  cf(%)
#>      banana 40 0.27 26.67  40  26.67
#>  strawberry 32 0.21 21.33  72  48.00
#>      cherry 28 0.19 18.67 100  66.67
#>       apple 25 0.17 16.67 125  83.33
#>       melon 25 0.17 16.67 150 100.00

7. LaTeX export

For publication-ready LaTeX tables use xtable::xtable() on any fdt object, including quantiles estimated from grouped data:

library(xtable)
ft <- fdt(rnorm(1000, 10, 2))
xtable(quantile(ft, i = 1:3))

ftm <- fdt(iris[, 1:4])
qq <- quantile(ftm, i = 1:3)
attr(qq, "subheadings") <- names(qq)
xtable(qq)

A dedicated vignette covers this workflow in detail:

vignette("latex_fdt", package = "fdth")

Session information

sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 26.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] fdth_1.5-3     rmarkdown_2.31
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39    R6_2.6.1         fastmap_1.2.0    xfun_0.59       
#>  [5] maketools_1.3.2  cachem_1.1.0     knitr_1.51       htmltools_0.5.9 
#>  [9] buildtools_1.0.0 lifecycle_1.0.5  cli_3.6.6        xtable_1.8-8    
#> [13] sass_0.4.10      jquerylib_0.1.4  compiler_4.6.1   sys_3.4.3       
#> [17] tools_4.6.1      evaluate_1.0.5   bslib_0.11.0     yaml_2.3.12     
#> [21] otel_0.2.0       jsonlite_2.0.0   rlang_1.2.0