• BIOL202 Tutorials
  • Preface
  • Welcome
    • Author
    • Acknowledgments
    • Copyright
    • UBCO Biology open materials
  • Getting started with R, RStudio & R Markdown
  • 1 What are R and RStudio?
    • 1.1 Installing R and RStudio
  • 2 Start using R & RStudio
    • 2.1 The RStudio Interface
    • 2.2 Coding basics
    • 2.3 R packages
    • 2.4 Package installation
    • 2.5 Package loading
    • 2.6 Intro to R Markdown
    • 2.7 Literate programming with R Markdown
    • 2.8 Making sure R Markdown knits to PDF
    • 2.9 Extra resources
    • 2.10 R Resources Online
  • Reproducible Workflows
  • 3 Reproducible Research
    • 3.1 Computational reproducibility
    • 3.2 An example BIOL202 workflow
    • 3.3 Microsoft OneDrive
    • 3.4 Directory structure
    • 3.5 Steps to set up directories
    • 3.6 Lecture workflow
    • 3.7 Tutorial workflow
    • 3.8 Create an RStudio Project
    • 3.9 Create subdirectories
      • 3.9.1 The here package
    • 3.10 Edit an R markdown file
    • 3.11 Components of an R Markdown file
    • 3.12 Interacting with Tutorial material
    • 3.13 Lab assignments workflow
  • 4 Preparing and formatting assignments
    • 4.1 Open your assignment RStudio project
    • 4.2 Download the assignment Rmd file
    • 4.3 Open the assignment Rmd file
    • 4.4 What to include in your answers
      • 4.4.1 Code chunk headers
      • 4.4.2 Import data
      • 4.4.3 Load packages
      • 4.4.4 Answer the questions
    • 4.5 Setting up R Markdown for graphing
    • 4.6 Example question / answer
    • 4.7 Knitting your assignment to PDF
    • 4.8 Submit your assignment
  • 5 Preparing and importing Tidy Data
    • 5.1 Tidy data
    • 5.2 Import a CSV file from a website
    • 5.3 Create a CSV file
    • 5.4 Import a local CSV file
    • 5.5 Get an overview of a dataset
    • 5.6 Tutorial practice activities
  • Visualizing and Describing Data
  • 6 Visualizing a single variable
    • 6.1 Load packages and import data
    • 6.2 Get an overview of the data
    • 6.3 Create a frequency table
    • 6.4 Create a bar graph
    • 6.5 Create a histogram
    • 6.6 Describing a histogram
  • 7 Describing a single variable
    • 7.1 Load packages and import data
    • 7.2 Describing a categorical variable
    • 7.3 Describing a numerical variable
      • 7.3.1 Calculating the median & IQR
      • 7.3.2 Calculating the mean & standard deviation
    • 7.4 Describing a numerical variable grouped by a categorical variable
  • 8 Visualizing associations between two variables
    • 8.1 Load packages and import data
    • 8.2 Visualizing association between two categorical variables
      • 8.2.1 Constructing a contingency table
      • 8.2.2 Constructing a grouped bar graph
      • 8.2.3 Constructing a mosaic plot
      • 8.2.4 Interpreting mosaic plots
    • 8.3 Visualizing association between two numeric variables
      • 8.3.1 Interpreting and describing a scatterplot
    • 8.4 Visualizing association between a numeric and a categorical variable
      • 8.4.1 Create a stripchart
      • 8.4.2 Create a violin plot
      • 8.4.3 Creating a boxplot
      • 8.4.4 Combining violin and boxplots
      • 8.4.5 Interpreting stripcharts, violin plots and boxplots
  • Inferential Statistics
  • 9 Sampling, Estimation, & Uncertainty
    • 9.1 Load packages and import data
    • 9.2 Functions for sampling
      • 9.2.1 Setting the “seed” for random sampling
    • 9.3 Sampling error
    • 9.4 Sampling distribution of the mean
      • 9.4.1 Visualize the sampling distribution
    • 9.5 Standard error of the mean
    • 9.6 Rule of thumb 95% confidence interval
  • 10 Hypothesis testing
    • 10.1 Load packages and import data
    • 10.2 Steps to hypothesis testing
    • 10.3 An hypothesis test example
      • 10.3.1 Following the steps to hypothesis testing
      • 10.3.2 Simulating a “null distribution”
      • 10.3.3 Calculating the P-value
      • 10.3.4 Writing a concluding statement
  • 11 Analyzing a single categorical variable
    • 11.1 Load packages and import data
    • 11.2 Estimating proportions
      • 11.2.1 Standard error for a proportion
      • 11.2.2 Confidence interval for a proportion
    • 11.3 Binomial distribution
    • 11.4 Binomial test
    • 11.5 Confidence interval approach to hypothesis testing
    • 11.6 Goodness-of-fit tests
  • 12 Analyzing associations between two categorical variables
    • 12.1 Load packages and import data
    • 12.2 Fisher’s Exact Test
      • 12.2.1 Hypothesis statement
      • 12.2.2 Display a contingency table
      • 12.2.3 Display a mosaic plot
      • 12.2.4 Conduct the Fisher’s Exact Test
    • 12.3 Estimate the Odds of getting sick
    • 12.4 Estimate the odds ratio
    • 12.5 \(\chi\)2 Contingency Test
      • 12.5.1 Hypothesis statement
      • 12.5.2 Display the contingency table
      • 12.5.3 Visualize a mosaic plot
      • 12.5.4 Check the assumptions
      • 12.5.5 Get the results of the test
  • 13 Analyzing a single numerical variable
    • 13.1 Load packages and import data
    • 13.2 One-sample t-test
      • 13.2.1 Hypothesis statement
      • 13.2.2 Assumptions of one-sample t-test
      • 13.2.3 A graph to accompany a one-sample t-test
      • 13.2.4 Conduct the one-sample t-test
      • 13.2.5 Concluding statement for the one-sample t-test
    • 13.3 Confidence intervals for \(\mu\)
      • 13.3.1 Confidence interval as a measure of precision for an estimate
      • 13.3.2 Confidence interval approach to hypothesis testing
  • 14 Comparing means among two groups
    • 14.1 Load packages and import data
    • 14.2 Paired t-test
      • 14.2.1 Calculate differences
      • 14.2.2 Hypothesis statement
      • 14.2.3 A graph to accompany a paired t-test
      • 14.2.4 Assumptions of the paired t-test
      • 14.2.5 Conduct the test
      • 14.2.6 Concluding statement
    • 14.3 Two sample t-test
      • 14.3.1 Hypothesis statement
      • 14.3.2 A table of descriptive statistics
      • 14.3.3 A graph to accompany a 2-sample t-test
      • 14.3.4 Assumptions of the 2-sample t-test
      • 14.3.5 Conduct the 2-sample t-test
      • 14.3.6 Concluding statement
    • 14.4 When assumptions aren’t met
  • 15 Checking assumptions and data transformations
    • 15.1 Load packages and import data
    • 15.2 Checking the normality assumption
      • 15.2.1 Normal quantile plots
      • 15.2.2 Shapiro-Wilk test for normality
    • 15.3 Checking the equal-variance assumption
    • 15.4 Data transformations
      • 15.4.1 Log-transform
      • 15.4.2 Dealing with zeroes
      • 15.4.3 Log bases
      • 15.4.4 Back-transforming log data
      • 15.4.5 Logit transform
      • 15.4.6 Back-transforming logit data
      • 15.4.7 When to back-transform?
  • 16 Comparing means among more than two groups
    • 16.1 Load packages and import data
    • 16.2 Analysis of variance
      • 16.2.1 Hypothesis statements
      • 16.2.2 A table of descriptive statistics
      • 16.2.3 Visualize the data
      • 16.2.4 Assumptions of ANOVA
      • 16.2.5 Conduct the ANOVA test
      • 16.2.6 Calculate \(R^2\) for the ANOVA
      • 16.2.7 Tukey-Kramer post-hoc test
      • 16.2.8 Visualizing post-hoc test results
      • 16.2.9 Concluding statement
    • 16.3 When assumptions aren’t met
  • 17 Analyzing associations between two numerical variables
    • 17.1 Load packages and import data
    • 17.2 Pearson correlation analysis
      • 17.2.1 Hypothesis statements
      • 17.2.2 Visualize the data
      • 17.2.3 Assumptions of correlation analysis
      • 17.2.4 Conduct the correlation analysis
      • 17.2.5 Concluding statement
    • 17.3 Rank correlation (Spearman’s correlation)
      • 17.3.1 Hypothesis statements
      • 17.3.2 Visualize the data
      • 17.3.3 Assumptions of Spearman rank correlation
      • 17.3.4 Conduct the test
      • 17.3.5 Concluding statement
  • 18 Least-squares linear regression
    • 18.1 Load packages and import data
    • 18.2 Least-squares regression analysis
      • 18.2.1 Equation of a line and “least-squares line”
      • 18.2.2 Hypothesis testing or prediction?
      • 18.2.3 Steps to conducting regression analysis
      • 18.2.4 State question and set the \(\alpha\) level
      • 18.2.5 Visualize the data
      • 18.2.6 Interpreting a scatterplot
      • 18.2.7 Checking assumptions of regression analysis
      • 18.2.8 Residual plots when you have missing values
      • 18.2.9 Transform the data
      • 18.2.10 Conduct the regression analysis
      • 18.2.11 Confidence interval for the slope
      • 18.2.12 Scatterplot with regression confidence bands
      • 18.2.13 Concluding statement
    • 18.3 Making predictions
      • 18.3.1 Back-transforming regression predictions
    • 18.4 Model-I versus Model-II regression
      • 18.4.1 Definitions
      • 18.4.2 Which one do I use?
  • Load all the necessary packages
  • Data summaries with “gtsummary” package
  • Creating tables in R Markdown
    • 18.5 Load packages and import data
      • 18.5.1 Formatting output from the skimr package
      • 18.5.2 A nicely formatted table of descriptive statistics
  • Visual Markdown Editor
    • A more familiar editing environment
  • Common errors and their solutions
    • Google can help
    • Rosetta error
    • Rtools required during install
    • Could not find function
    • There is no package
    • Trying to use CRAN without setting a mirror
    • PDF Latex is not found
    • Error in parse
    • No such file or directory exists
    • Messy output when loading packages
    • Unused argument
    • Object not found
    • Figure caption doesn’t show up below figure in knitted document
    • Figures are placed in weird spots in knitted PDF
    • Installing packages: there is a binary version available
    • Unicode knitting error
  • Published with bookdown

Tutorials for BIOL202: Introduction to Biostatistics

Copyright

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Please use the following for citing this document

Pither, J. (2022). Tutorials for BIOL202: Introduction to Biostatistics. https://ubco-biology.github.io/BIOL202/index.html

All source files are available on github.