## 12.5 \(\chi\)^{2} Contingency Test

When the contingency table is of dimensions greater than 2 x 2, the most commonly applied test is the \(\chi\)^{2} Contingency Test.

For this activity we’re using the “worm” data associated with **Example 9.4 on page 244** of the test. Please read the example!

### 12.5.1 Hypothesis statement

As shown in the text example, we have a 2 x 3 contingency table, and we’re testing for an association between two categorical variables.

Here are the null and alternative hypotheses (compare these to what’s written in the text):

**H _{0}**: There is no association between the level of trematode parasitism and the frequency (or probability) of being eaten.

**H**: There is an association between the level of trematode parasitism and the frequency (or probability) of being eaten.

_{A}- We use \(\alpha\) = 0.05.

- It is a two-tailed alternative hypothesis

- We’ll use a contingency test to test the null hypothesis, because this is appropriate for analyzing for association between two categorical variables, and when the resulting contingency table has dimension greater than 2 x 2.

- We will use the \(\chi\)
^{2}test statistic, with degrees of freedom equal to (r-1)(c-1), where “r” is the number of rows, and “c” is the number of colums, so (2-1)(3-1) = 2.

- We must check assumptions of the \(\chi\)
^{2}contingency test - It is always a good idea to present a figure to accompany your analysis; in the case of a contingency test, the figure heading will include information about the sample size / total number of observations

### 12.5.2 Display the contingency table

Let’s generate a contingency table:

```
%>%
worm tabyl(fate, infection) %>%
adorn_totals(where = c("row", "col"))
```

```
## fate highly lightly uninfected Total
## eaten 37 10 1 48
## not eaten 9 35 49 93
## Total 46 45 50 141
```

Hmm, the ordering of the categories of the categorical (factor) variable “infection” is the opposite to what is displayed in the text.

**TIP **
The ordering of the categories is not actually crucial to this type of analysis, but it’s certainly better practice to show them in appropriate order!

In order to change the order of categories in a character variable, here’s what you do (consult this resource for additional info).

First, convert the character variable to a “factor” variable as follows:

`$infection <- as.factor(worm$infection) worm`

Now check the default ordering of the levels using the `levels`

function (it should be alphabetical):

`levels(worm$infection)`

`## [1] "highly" "lightly" "uninfected"`

Now change the ordering as follows:

```
$infection <- factor(worm$infection, levels = c("uninfected", "lightly", "highly"))
wormlevels(worm$infection)
```

`## [1] "uninfected" "lightly" "highly"`

Now re-display the contingency table, first storing it in an object “worm.table”:

```
<- worm %>%
worm.table tabyl(fate, infection) %>%
adorn_totals(where = c("row", "col"))
worm.table
```

```
## fate uninfected lightly highly Total
## eaten 1 10 37 48
## not eaten 49 35 9 93
## Total 50 45 46 141
```

That’s better!

Visualize it nicely with `kable`

:

`kable(worm.table)`

fate | uninfected | lightly | highly | Total |
---|---|---|---|---|

eaten | 1 | 10 | 37 | 48 |

not eaten | 49 | 35 | 9 | 93 |

Total | 50 | 45 | 46 | 141 |

### 12.5.3 Visualize a mosaic plot

Here’s a mosaic plot with an ideal figure caption included:

```
%>%
worm ggplot() +
geom_mosaic(aes(x = product(infection), fill = fate)) +
scale_fill_manual(values=c("darkred", "gold")) +
scale_y_continuous(breaks = seq(0, 1, by = 0.2)) +
xlab("Level of infection") +
ylab("Relative frequency") +
theme_bw()
```

In the code above we manually specified the two fill colours using the `scale_fill_manual`

function.

### 12.5.4 Check the assumptions

The \(\chi\)^{2} contingency test (also known as association test) has assumptions that **must be checked** prior to proceeding:

- none of the categories should have an expected frequency of less than one
- no more than 20% of the categories should have expected frequencies less than five

To test these assumptions, we need to actually conduct the test, because in doing so R calculates the **expected frequencies** for us.

Conduct the test using the `chisq.test`

function from `janitor`

package. **NOTE** this again overlaps with a function name from the base R package, so we’ll need to specify that we want the “janitor” version of the function.

We need our contingency table as input to the function, but this time without margin totals.

We’ll assign the results to an object “worm.chisq.results”:

```
<- worm %>%
worm.chisq.results tabyl(fate, infection) %>%
::chisq.test() janitor
```

Have a look at the output:

` worm.chisq.results`

```
##
## Pearson's Chi-squared test
##
## data: .
## X-squared = 69.756, df = 2, p-value = 7.124e-16
```

Although only a few bits of information are provided here, the object actually contains a lot more information.

Don’t try getting an overview of the object using our usual `skim_without_charts`

approach.. that won’t work.

Instead, simply use this code:

`names(worm.chisq.results)`

```
## [1] "statistic" "parameter" "p.value" "method" "data.name" "observed"
## [7] "expected" "residuals" "stdres"
```

As you can see, one of the names is `expected`

. This is what holds our expected frequencies (and note that these values do not need to be whole numbers, unlike the observed frequencies):

`kable(worm.chisq.results$expected)`

fate | uninfected | lightly | highly | |
---|---|---|---|---|

eaten | eaten | 17.02128 | 15.31915 | 15.65957 |

not eaten | not eaten | 32.97872 | 29.68085 | 30.34043 |

We see that all our assumptions are met: none of the cells (cross-classified categories) in the table have an expected frequency of less than one, and no more than 20% of the cells have expected frequencies less than five.

### 12.5.5 Get the results of the test

We can see the results of the \(\chi\)^{2} test by simply typing the name of the results object:

` worm.chisq.results`

```
##
## Pearson's Chi-squared test
##
## data: .
## X-squared = 69.756, df = 2, p-value = 7.124e-16
```

This shows a very large value of \(\chi\)^{2} (69.76) and a very small *P*-value - much smaller than our stated \(\alpha\). So we reject the null hypothesis.

**Concluding statement**

The probability of being eaten is significantly associated with the level of trematode parasitism (\(\chi\)

^{2}contingency test;df= 2; \(\chi\)^{2}= 69.76;P< 0.001). Based on our mosaic plot (Fig. X), the probability of being eaten increases substantially with increasing intensity of parasitism.