6.3 Create a frequency table

Sometimes when the aim is to visualize a single categorical variable, it’s useful to present a frequency table. If your variable has more than, say, 10 unique categories, then this approach can be messy, and instead one should solely create a bar graph, as described in the next section.

Many straightforward operations like tabulation and calculating descriptive statistics can be done using the functionality of the dplyr package (see the cheatsheet here), which gets loaded as part of the tidyverse suite of packages.

Here, we’ll use this functionality to create a frequency table for a categorical variable.

We’ll demonstrate this using the tigerdeaths.csv dataset that you should have imported as part of a suggested activity in the previous section, using code like this:

TIP Some datasets that we use for tutorials need to be imported into an object in your workspace. This is the case with the tigerdeaths dataset, and the code for importing the data into a tibble is below. Other datasets, like the penguins dataset, exist within packages (palmerpenguins), and their objects are already created for you. Most of the time, and particularly for your lab assignments, you need to import a dataset and create an object, as we do below.

# here we import the data from a CSV file and put it into a "tibble" object called "tigerdeaths"
tigerdeaths <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/tigerdeaths.csv")

You would also have gotten overview of the data as part of the activity, using the skim_without_charts function. This would have shown that the activity variable is of type “character”, which tells us it is a categorical variable, and that it includes 9 unique categories. We also would have seen that there are 88 cases (rows) in the dataset.

Let’s provide the code to generate the frequency table, using the pipes “%>%” approach we learned about in an earlier tutorial. We’ll assign the output to a new object that will hold the frequency table. We’ll name the object “tiger.table”. Note that we won’t yet view the table here… we’ll do that next.

We’ll provide the code first, then explain it step-by-step after.

Here’s the code for creating the frequency table and assigning it to a new object named “tiger.table”:

tiger.table <- tigerdeaths %>%
  count(activity, sort = TRUE) %>% 
  mutate(relative_frequency = n / sum(n)) %>%
  • The first line provides the name of the object (tibble) that we’re going to create (here, “tiger.table”), and use the assignment operator (“<-”) tell R to put whatever the output of our operation is into that object. The next part of the first line provides the name of the object that we’re going to do something with, here “tigerdeaths”. The “%>%” tells R that we’re not done yet, and there’s more lines of code to come.
  • The second line uses the count function from the dplyr package to tally the unique values of a variable, in this case the “activity” variable. It also takes an argument “sort = TRUE”, telling it to sort the counts in descending order (the default sort direction). Then another “%>%” to continue the code..
  • The last line uses mutate function from the dplyr package that creates a new variable, and the arguments provided in the parentheses tells R what that variable should be called, here “relative_frequency”, and then how to calculate it.
  • The n in the third line is a function that tallies the sample size or count of all observations in the present category or group, and then the sum(n) sums up the total sample size. Thus, n / sum(n) calculates the relative frequency (equivalent to the proportion) of all observations that are within the given category
  • the adorn_totals in the last line is a function from the janitor package that enables adding row and / or column totals to tables (see the help for this function for more details)

Try figuring out how you would change the last line of code in the chunk above so that the table showed the percent rather than the relative frequency of observations in each category

Now that we’ve created the frequency table, let’s have a look at it.

In a supplementary tutorial, you’ll find instructions on how to create fancy tables for output. Here, you’ll learn the basics.

For our straightforward approach to tables with table headings (or captions), we’ll use the kable function that comes with the knitr package, using the pipe approach:

tiger.table %>%
  kable(caption = "Frequency table showing the activities of 88 people at the time they were attacked and killed by tigers near Chitwan national Park, Nepal, from 1979 to 2006", digits = 3)
Table 6.1: Frequency table showing the activities of 88 people at the time they were attacked and killed by tigers near Chitwan national Park, Nepal, from 1979 to 2006
activity n relative_frequency
Grass/fodder 44 0.500
Forest products 11 0.125
Fishing 8 0.091
Herding 7 0.080
Disturbing tiger kill 5 0.057
Fuelwood/timber 5 0.057
Sleeping in house 3 0.034
Walking 3 0.034
Toilet 2 0.023
Total 88 1.000

The key argument to the kable function is the table object (which here we provide before the pipe), and the table heading (caption).

Notice that this produces a nicely formatted table with an appropriately worded caption. The argument “digits = 3” tells it to return numeric values to 3 digits in the table.

You now know how to create a frequency table for a categorical variable!

Your table caption won’t include a number (e.g. Table 1) until you actually knit to PDF. Be sure to check your PDF to ensure that the table captions show up, and are numbered!

  1. Frequency table: Try creating a frequency table using the birds dataset, which includes data about four types of birds observed at a wetland.