## 11.3 Binomial distribution

The binomial distribution provides the probability distribution for the number of “successes” in a fixed number of independent trials, when the probability of success is the same in each trial.

Here’s the formula:

It turns out there’s a handy function `dbinom`

(available in the base R package) that will calculate the exact probability associated with any particular outcome for a random trial with a given sample space (set of outcomes) and probability of success. It uses the equation shown above.

Check the help file for the function:

`?dbinom`

**Dice example**

Imagine rolling a fair, 6-sided die *n* = 6 times (six random trials). Let’s consider rolling a “4” a “success”.

What is the probability of observing two fours (i.e. two successes) in our 6 rolls of the die (random trials)?

We have *X* = 2 (the number of successes), *p* = 1/6 (the probability of a success in each trial), and *n* = 6 (the number of trials).

Here’s the code, where “x” represents our “*X*”, “size” represents the number of trials (“*n*”), and “prob” is the probability of success in each trial (here, 1/6).

We’ll create an object to hold the number of trials we wish to use first:

```
<- 6
num.trials dbinom(x = 2, size = num.trials, prob = 1/6)
```

`## [1] 0.2009388`

Thus, the probability of rolling two fours (i.e. having 2 successes) out of 6 rolls of the dice is about 0.201.

In order to get the probabilities associated with *each* possible outcome (i.e. 0 through 6 successes), we use the code shown in the chunk below.

```
.6 <- tibble(
exact.probsX = 0:num.trials,
probs = dbinom(x = 0:num.trials, size = num.trials, prob = 1/6)
).6 exact.probs
```

```
## # A tibble: 7 × 2
## X probs
## <int> <dbl>
## 1 0 0.3349
## 2 1 0.4019
## 3 2 0.2009
## 4 3 0.05358
## 5 4 0.008038
## 6 5 0.0006430
## 7 6 0.00002143
```

Above we created a new tibble object (using the function `tibble`

) called “exact.probs.6”, with a variable “X” that holds each of the possible outcomes (X = 0 through 6 or the number of trials), and “probs” that holds the probability of each outcome, calculated using the `dbinom`

function:

See that the `dbinom`

function will accept a vector of values of `x`

, for which the associated probabilities are calculated.

Now let’s use these exact probabilities to create a barplot showing an exact, discrete probability distribution, corresponding to the binomial distribution with a sample size (number of trials) of *n* = 6 and a probability of success *p* = 1/6:

```
ggplot(exact.probs.6, aes(y = probs, x = X)) +
geom_bar(stat = "identity", fill = "lightgrey", colour = "black") +
xlab("Number of successes (X)") +
ylab("Pr[X]") +
theme_bw()
```

What if we wished to calculate the probability of getting *at least* two fours in our 6 rolls of the dice?

Consult the bar chart above. Recall that “rolling a 4” is our definition of a “success” (it could have been “rolling a 1”, or “rolling a 5” - these all result in the same calculation). Thus to calculate the probability of getting at least 2 successes we need to sum up the probabilities associated with getting 2, 3, 4, 5, and 6 successes.

We can do this using the following R code:

```
.2_to_6 <- dbinom(x = 2:num.trials, size = num.trials, prob = 1/6)
probssum(probs.2_to_6)
```

`## [1] 0.2632245`

Note that we ask the `dbinom`

function to do the calculation for each of the `2:num.trials`

outcomes of interest. We store these calculated probabilities in a new object “probs.2_to_6”.

Then, we use the `sum`

function to sum up the probabilities within that object.

The resulting value of 0.2632245 looks about right based on our bar chart!

Now let’s increase the number of trials to *n* = 15, and compare the distribution to that observed using *n* = 6:

```
<- 15
num.trials .15 <- tibble(
exact.probsX = 0:num.trials,
probs = dbinom(x = 0:num.trials, size = num.trials, prob = 1/6)
).15 exact.probs
```

```
## # A tibble: 16 × 2
## X probs
## <int> <dbl>
## 1 0 6.491e- 2
## 2 1 1.947e- 1
## 3 2 2.726e- 1
## 4 3 2.363e- 1
## 5 4 1.418e- 1
## 6 5 6.237e- 2
## 7 6 2.079e- 2
## 8 7 5.346e- 3
## 9 8 1.069e- 3
## 10 9 1.663e- 4
## 11 10 1.996e- 5
## 12 11 1.814e- 6
## 13 12 1.210e- 7
## 14 13 5.583e- 9
## 15 14 1.595e-10
## 16 15 2.127e-12
```

Now plot the binomial probability distribution:

```
ggplot(exact.probs.15, aes(y = probs, x = X)) +
geom_bar(stat = "identity", fill = "lightgrey", colour = "black") +
xlab("Number of successes (X)") +
ylab("Pr[X]") +
theme_bw()
```

- Challenge: Binomial probabilities

- Use the
`dbinom`

function to calculate the probability of rolling three “2”s when rolling a fair six-sided die 20 times. - Produce a graph of a discrete probability distribution for this scenario:
*p*= 1/4, and*n*= 12.