We use a bar graph to visualize the frequency distribution for a single categorical variable.
We’ll use the
ggplot approach with its
geom_bar function to create a bar graph. The
ggplot function comes with the
ggplot2 package, which itself is loaded as part of the
To produce the bar graph, we use a frequency table as the input. Thus, let’s repeat the creation of the “tiger.table” from the preceding section, but this time we exclude the
adorn_totals line of code, because we don’t want the “total” row to be plotted in the bar graph.
<- tigerdeaths %>% tiger.table count(activity, sort = TRUE) %>% mutate(relative_frequency = n / sum(n))
Recall that the “tiger.table” is a sort of summary presentation of the “activity” variable:
## # A tibble: 9 × 3 ## activity n relative_frequency ## <chr> <int> <dbl> ## 1 Grass/fodder 44 0.5 ## 2 Forest products 11 0.125 ## 3 Fishing 8 0.09091 ## 4 Herding 7 0.07955 ## 5 Disturbing tiger kill 5 0.05682 ## 6 Fuelwood/timber 5 0.05682 ## 7 Sleeping in house 3 0.03409 ## 8 Walking 3 0.03409 ## 9 Toilet 2 0.02273
It shows the total counts (frequencies) of individuals in each of the nine “activity” categories.
And although in the code chunk below you’ll see that we provide an “x” and a “y” variable for creating the graph, remember that we’re really only visualizing a single categorical variable.
Let’s provide the code first, and explain after.
ggplot(data = tiger.table, aes(x = reorder(activity, n), y = n)) + geom_bar(stat = "identity") + ylab("Frequency") + xlab("Activity") + coord_flip() + theme_bw()
All figures produced using the
ggplot2 package start with the
ggplot function. Then the following arguments:
- The tibble (or dataframe) that holds the data (“data = tiger.table”)
- An “aes” argument (which stands for “aesthetics”), within which one specifies the variables to be plotted; here we’re plotting the frequencies from the “n” variable in the frequency table as the “y” variable, and the “activity” categorical variable as the “x” variable. To ensure the proper sorting of the bars, we use the
reorderfunction, telling R to reorder the
activitycategories according to the frequencies in the
- Then there’s a plus sign (“+”) to tell the
ggplotfunction we’re not done yet with our graph - there are more lines of code coming (think of it as ggplot’s version of the “pipe”)
- Then the type of graph, which uses a function starting with “geom”; here we want a bar graph, hence
geom_barfunction has its own argument: “stat = ‘identity’” tells it just to make the height of the bars equal to the values provided in the “y” variable, here
ylabfunction sets the y-axis label
xlabfunction sets the x-axis label
coord_flipfunction tells it to rotate the graph horizontally; this makes it easier to fit the activity labels on the graph
- Then the
theme_bwfunction indicates we want a simple black-and-white theme
There you have it: a nicely formatted bar graph!
REMINDER Don’t forget to include a good figure caption! Here’s a snapshot of the full code chunk that produced the bar graph above:
- Bar graph: Try creating a bar graph using the
birdsdataset, which includes data about four types of birds observed at a wetland.