6.3 Create a frequency table
Sometimes when the aim is to visualize a single categorical variable, it’s useful to present a frequency table. If your variable has more than, say, 10 unique categories, then this approach can be messy, and instead one should solely create a bar graph, as described in the next section.
Many straightforward operations like tabulation and calculating descriptive statistics can be done using the functionality of the dplyr
package (see the cheatsheet here), which gets loaded as part of the tidyverse
suite of packages.
Here, we’ll use this functionality to create a frequency table for a categorical variable.
We’ll demonstrate this using the tigerdeaths.csv
dataset that you should have imported as part of a suggested activity in the previous section, using code like this:
TIP
Some datasets that we use for tutorials need to be imported into an object in your workspace. This is the case with the tigerdeaths
dataset, and the code for importing the data into a tibble is below. Other datasets, like the penguins
dataset, exist within packages (palmerpenguins), and their objects are already created for you. Most of the time, and particularly for your lab assignments, you need to import a dataset and create an object, as we do below.
# here we import the data from a CSV file and put it into a "tibble" object called "tigerdeaths"
tigerdeaths <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/tigerdeaths.csv")
You would also have gotten overview of the data as part of the activity, using the skim_without_charts
function. This would have shown that the activity
variable is of type “character”, which tells us it is a categorical variable, and that it includes 9 unique categories. We also would have seen that there are 88 cases (rows) in the dataset.
Let’s provide the code to generate the frequency table, using the pipes “%>%” approach we learned about in an earlier tutorial. We’ll assign the output to a new object that will hold the frequency table. We’ll name the object “tiger.table”. Note that we won’t yet view the table here… we’ll do that next.
We’ll provide the code first, then explain it step-by-step after.
Here’s the code for creating the frequency table and assigning it to a new object named “tiger.table”:
tiger.table <- tigerdeaths %>%
count(activity, sort = TRUE) %>%
mutate(relative_frequency = n / sum(n)) %>%
adorn_totals()
- The first line provides the name of the object (tibble) that we’re going to create (here, “tiger.table”), and use the assignment operator (“<-”) tell R to put whatever the output of our operation is into that object. The next part of the first line provides the name of the object that we’re going to do something with, here “tigerdeaths”. The “%>%” tells R that we’re not done yet, and there’s more lines of code to come.
- The second line uses the
count
function from thedplyr
package to tally the unique values of a variable, in this case the “activity” variable. It also takes an argument “sort = TRUE”, telling it to sort the counts in descending order (the default sort direction). Then another “%>%” to continue the code.. - The last line uses
mutate
function from thedplyr
package that creates a new variable, and the arguments provided in the parentheses tells R what that variable should be called, here “relative_frequency”, and then how to calculate it. - The
n
in the third line is a function that tallies the sample size or count of all observations in the present category or group, and then thesum(n)
sums up the total sample size. Thus,n / sum(n)
calculates the relative frequency (equivalent to the proportion) of all observations that are within the given category
- the
adorn_totals
in the last line is a function from thejanitor
package that enables adding row and / or column totals to tables (see the help for this function for more details)
Try figuring out how you would change the last line of code in the chunk above so that the table showed the percent rather than the relative frequency of observations in each category
Now that we’ve created the frequency table, let’s have a look at it.
In a supplementary tutorial, you’ll find instructions on how to create fancy tables for output. Here, you’ll learn the basics.
For our straightforward approach to tables with table headings (or captions), we’ll use the kable
function that comes with the knitr
package, using the pipe approach:
tiger.table %>%
kable(caption = "Frequency table showing the activities of 88 people at the time they were attacked and killed by tigers near Chitwan national Park, Nepal, from 1979 to 2006", digits = 3)
activity | n | relative_frequency |
---|---|---|
Grass/fodder | 44 | 0.500 |
Forest products | 11 | 0.125 |
Fishing | 8 | 0.091 |
Herding | 7 | 0.080 |
Disturbing tiger kill | 5 | 0.057 |
Fuelwood/timber | 5 | 0.057 |
Sleeping in house | 3 | 0.034 |
Walking | 3 | 0.034 |
Toilet | 2 | 0.023 |
Total | 88 | 1.000 |
The key argument to the kable
function is the table object (which here we provide before the pipe), and the table heading (caption).
Notice that this produces a nicely formatted table with an appropriately worded caption. The argument “digits = 3” tells it to return numeric values to 3 digits in the table.
You now know how to create a frequency table for a categorical variable!
Your table caption won’t include a number (e.g. Table 1) until you actually knit to PDF. Be sure to check your PDF to ensure that the table captions show up, and are numbered!
- Frequency table: Try creating a frequency table using the
birds
dataset, which includes data about four types of birds observed at a wetland.