## 8.3 Visualizing association between two numeric variables

We use a **scatterplot** to show association between two numerical variables.

We’ll use the `ggplot`

function that we’ve seen before, along with `geom_point`

to construct a scatterplot.

We’ll provide an example using the `penguins`

dataset, examining how bill depth and length are associated among the penguins belonging to the Adelie species.

As shown in the tutorial on preparing and formatting assignments, we can use the `filter`

function from the `dplyr`

package to easily subset datasets according to some criterion, such as belonging to a specific category.

```
%>%
penguins filter(species == "Adelie") %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(shape = 1) +
xlab("Bill length (mm)") +
ylab("Bill depth (mm)") +
theme_bw()
```

In the code chunk above, we have:

- the input tibble
`penguins`

followed by the pipe (“%>%”) - the
`filter`

function with the criterion used for subsetting, specifically any cases in which the “species” categorical variable equals “Adelie” - then we provide the
`ggplot`

function and its`aes`

argument, specifying the x- and y- variables to be used - then we use
`geom_point`

to tell R to create a scatterplot using points, and specifically “shape = 1” denotes hollow circles - then we have x and y labels, followed by the
`theme_bw`

function telling R to use black and white theme

Notice that the figure caption indicates the number of observations (sample size) used in the plot. In a previous tutorial it was emphasized that one needs to be careful in tallying the actual number of observations being used in a graph or when calculating descriptive statistics. For example, there is one missing value (“NA”) in the bill measurements for the Adelie penguins, hence the sample size of 151 instead of 152.

Recall that you can use the `skim`

or `skim_without_charts`

functions to get an overview of a dataset or of a single variable in a dataset, and to figure out how many missing values there are for each variable. You can also use the `summarise`

function, as described previously.

### 8.3.1 Interpreting and describing a scatterplot

Things to report when describing a scatterplot:

- is there an association? A “shotgun blast” pattern indicates no. If there is an association, is it
*positive*or*negative*? - if there is an association, is it weak, moderate, or strong?
- is the association
*linear*? If not, is there a different pattern like concave down? - are there any
*outlier*observations that lie far from the general trend?

In the scatterplot above, bill length and depth are positively associated, and the association is moderately strong. There are no observations that are strongly inconsistent with the general trend, though one individual with bill length of around 35mm and depth of around 21mm may be somewhat unusual.

- Using the
`penguins`

dataset, create a scatterplot of flipper length in relation to body mass, and provide an appropriate figure caption.