library(tidyverse) library(skimr) library(broom) library(knitr)
We’ll use the “plantbiomass” dataset:
## Rows: 161 Columns: 2 ## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## dbl (2): nSpecies, biomassStability ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
plantbiomass dataset (Example 17.3 in the text book) includes data describing plant biomass measured after 10 years within each of 5 experimental “species richness” treatments, wherein plants were grown in groups of 1, 2, 4, 8, or 16 species. The treatment variable is called
nSpecies, and the response variable
biomassStability is a measure of ecosystem stability. The research hypothesis was that increasing diversity (species richness) would lead to increased ecosystem stability.
Below is a visualization of the data.
When one visualizes the data, one might ask why an ANOVA isn’t the analysis method of choice. If we were simply interested in testing the null hypothesis that “there is no difference in mean ecosystem stability among the species richness treatment groups”, then an ANOVA could be used, and we would simply treat the “species richness” variable as a categorical variable. However, here we are interested not simply in testing for differences among treatment groups, but more specifically in quantifying if and how ecosystem stability varies with variation in species richness, AND if so, whether we can reliably predict ecosystem stability based on species richness. For this, we need to construct a regression model.
Let’s have a look at the data:
|Number of rows||161|
|Number of columns||2|
|Column type frequency:|
Variable type: numeric
We see that both variables are numeric, and that there are 161 observations overall.