Blumberg's Summer Mathematics Notes for Week #4 (2001)

Foundations of Mathematics
Summer Program , Columbia University
July 2001 -- Week #4 Notes
Roger B. Blumberg

Last week, we learned about permutations and combinations and then tried to generalize the "counting" or "subset making" approach to as many different kinds of problems as we could. Here's another use for the binomial coefficient:

The number of positive integer solutions to the equation v + w + x + y + z = 35 is given by the binomial coefficient C(34,4).

Why? Can we generalize this idea for the equation x[1] + x[2] + .... + x[n] = k ?

Finally, although our discussion of Pascal's Triangle showed how the mathematics of counting can help us identify patterns, you might wonder how the mathematics of combinations can help us with proofs. Here's one example: Prove that the product of any five consecutive integers is divisible by 5! While we could of course prove this by induction, it may be easier to prove it directly:

If n is the largest of the five integers, then we need to prove that n(n-1)(n-2)(n-3)(n-4) is divisible by 5!
We know that C(n,r) is an integer for all integers n > r.
C(n,5) = [n(n-1)(n-2)(n-3)(n-4)] / 5!
Therefore, the product of any five consecutive integers is divisible by 5!

Take a moment to digest that, think about how we could generalize this to show that the product of any k consecutive integers will be divisible by k!, and then we'll move on to the transition from counting to calculating probabilities.

1. Suppose you are dealt 5 cards from a fair deck of 52. How many different ways are there to get a straight? What is the probability of getting a straight?

The difference between the answer to the first question (C(9,1)*[C(4,1)^5]) and the second question will just be that in the second case we put the answer over C(52,5), which gives the total number of 5 card hands that are possible. This leads to the basic definition of the probability of a discrete event, E,:

(# of successful
outcomes)/(# of possible outcomes)

2. Suppose that two students are chosen from this class at random. How many different ways might one chose two female students? If the two students are chosen at random, what is the probability that both students are male? That exactly one is male and one is female?

Here again we use binomial coefficients to calculate both the number of successful cases and the total number of possible cases. Thus, if we have a class with 12 females and 8 males, the probability that we have chosen two male students is: C(8,2)/C(20,2).

A. The Concepts of Discrete Probability

We begin by distinguishing elementary, or simple, events from complex events. For example, if we roll a fair die the event "roll a 4" is simple, while the event "roll an odd number" is complex (i.e. made up of more than one simple events). Now, in addition to the basic definition of probability above we have the following fundamental ideas:

The Outcome Space

Conceptualizing the outcome space (i.e. the set of possible outcomes) is perhaps the most important step in calculating (discrete) probabilities. Consider the calculation of p(prime) in the roll of one die; p(sum=10) in the roll of two dice; p(3H and 2T) in five flips of a coin.

The importance of being clear about the outcome space in the case of complex events and repeated trials.

Suppose we reconsider the examples if the dice are "loaded", with p(i) = i/21.

The difference between uniform and non-uniform likelihood in conceptualizing the outcome space.

The Axioms

p(ø) = 0; p(U) = 1

This may seem trivial, but once we realize that it implies that p(E) = 1 - p(~E) it is extremely powerful.

Example 1: Suppose you roll four fair dice. What is the probability that the sum is less than 24? Once we realize that p(sum<24) + p(sum=24) = 1, the answer is easily computed.

Example 2: Suppose you have ten addressed envelopes and ten personalized invitations, and the invitations are sorted into the envelopes at random; what is the probability that at least one invitation is in the wrong envelope?

p(A v B) = p(A) + p(B) - p(A & B)

Suppose two fair dice are rolled: What is the probability their sum is prime or that you've rolled doubles?

p(A & B) = p(A) x p(B) [for independent events A and B]

Suppose you roll a die and flip a coin. What is the probability you rolled a 5 and flipped a Tail? Notice how the definition captures the size of the outcome space in the denominator.

B. Repeated trials and the mathematics of counting.

Which of the following strings of coin flips is most likely if the coin is fair?:

HHHTTTHHTT

TTTTTTTTTTT

HTHTHTHTHT

Then why, given ten flips of a fair coin, do we think 5H & 5T any more likely than 10H? Given 12 rolls of a fair die, why is 2 of each outcome more likely than 12 fours?

We realize that, for repeated trials, the probability of a complex event will have to take into account both the number of ways the event can occur and the probability of each instance of the event as follows:

p(E) = (# ways E can occur) * (the probability of each instance of E)

We modify the binomial formula to accommodate this insight, by realizing that now each of the "bins" or "subsets" can be represented as having a probability attached to it. Thus, for n repeated flips of a coin, the probability of getting r heads is:

We can also modify the multinomial formula, and find that for n repeated rolls of a die, the probability of getting exactly: r(1) 1s; r(2) 2s; r(3) 3s; r(4) 4s; r(5) 5s; and r(6) 6s is:

C(n,r1,r2,..r6) * p(1)^r1 * p(2)^r2 * ... *
p(6)^r6

For example, suppose we go back to the insects we were talking about at the start of the unit on counting. Suppose we have 20 ants, 30 bees and 50 roaches all in one box, and we choose 10 insects at random (all at once). What is the probability we choose 2 ants, 3 bees, and 5 roaches? Before calculating, do you think the probability greater than .5? Do you think this outcome the most likely to occur? How can you reconcile the small probability with the fact that it is the most likely outcome?

We now have a mathematical theory of probability which incorporates our theories about counting.

C. Conditional Probability and Bayes Theorem

Recognizing the importance of the outcome space in all of our calculations, it is easy to understand why having information restricts the outcome space allows us to make more accurate probability judgements. Consider how your answers would differ in the following cases:

Two fair dice are rolled. What is the probability that the outcome is double sixes?
Two fair dice are rolled and the outcome is divisible by 3. What is the probability that the outcome is double sixes?

Clearly the outcome space is more restricted in the second case, and thus our answer in the second case is different than in the first. (Aside: can you think of a piece of information that would have made us judge the possibility lower in the second case?)

We can use this example to derive a formula for conditional probability:

p(A|B) = p(A and B) / p(B)

Finally, we can build on this idea and come up with a method for dealing with cases in which a particular event can occur in any of several subsets. For example, consider a high school population in which some of the members of each grade are smokers. Suppose we want to figure out the probability of choosing a smoker if we randomly choose a student from the school.

Bayes Theorem states that if a finite outcome space is completely divided into disjoint subsets (e.g. s[1], s[2], s[3], ..., s[n]), then for any event, E, in the outcome space:

p(E) = p(E|s[1])p(s[1]) + p(E|s[2])p(s[2]) + ...... + p(E|s[n])*p(s[n])

Review

A. In order to review the basic of probability from last time, and the way the counting methods we studied last week are integrated into combinatorial probability we look at four problems:

A fair die is rolled. What is the probability of getting a 4? an even number?
Two fair dice are rolled. What is p(sum=5)? p(the products of the dies is prime)?
A fair die is rolled 7 times. What is p(exactly three 4s)?
If a fair die is rolled sixty times, which has the greater probability: p(thirty 3s) or p(ten 3s)?

B. In preparation for the final exam, in past years I've solicited questions from students. Here are some of the kinds of questions that were asked.

How many ways are there to choose 3 letters from MASSACHUSETTS

Unlike the permutation problems involving repeated letters, we cannot assume that every particular combination of three letters will be duplicated the same number of times. For example, C(13,3) would not count the combination MCH more than once, but would count MCT twice, and would count MSC four times. Therefore, we have two choices: 1) calculate C(13,3) and subtract the number of combinations we've counted more than once; or 2) realize that, since there are only 7 different letters in MASSACHUSETTS, we can calculate C(7,3) and then add on all those combinations we left out. If we choose the second approach, we see that the combinations we've left out are just those with two As (there are 6 of these), two Ss (there are 6 of these too), two Ts (again, there are 6), and the single combination of SSS.

Prove, by mathematical induction, that C(2n,2) = 2 * C(n,2) + n^2

I. Since the case of n=1 doesn't make much sense (what is C(1,2)?), we consider n=2. We get C(4,2) = 2 * C(2,2) + 2^2 = 2 * 1 + 4 = 6, which equals C(4,2).

II. We want to show that

if: C(2n,2) = 2 * C(n,2) + n^2
then: C(2[n+1],2) = 2 * C([n+1],2) + [n+1]^2

C(2n+2, 2) - C(2n, 2) = 2*(C(n+1, 2)) + (n+1)^2 - [2*C(n,2)+n^2]

(2n+2)!/(2n!*2!) - [2n!/((2n-2)!*2!)] = (n+1)^2 - n^2 + 2(C(n+1,2) - C(n,2))

(2n+2)!/(2n!*2!) - [2n!/((2n-2)!*2!)] = 2n +1 + 2(C(n+1,2) - C(n,2))

Finding common denominators and expanding the left side, and rewriting the combination of (n+1) elements on the right side, in terms of the combination of n elements (recalling the equality from Pascal's Triangle), we get:

[(2n+2)(2n+1)(2n!)]/(2n!*2!) - [((2n)(2n-1)(2n!))/(2n!*2!)] = 2n+1+2(C(n,2)+C(n,1)-C(n,2))

(4n^2 + 6n + 2)/ 2 - [(4n^2 - 2n)/2] = 2n+1+2(n)
(8n + 2)/2 = 2n + 1 + 2n = 4n+1

If you flip a fair coin 8 times, what is the probability of getting exactly 6 heads and 2 tails?

This is a clear example of repeated trials, and so the probability must take into account both the number of ways we can get 6 heads in 8 flips, and the probability of each way. We therefore calculate p(6H and 2T) = C(8,6) * (1/2)^8. Of course, we could have used C(8,2) in this calculation as well.

If you roll a fair die eight times, what is the probability of getting exactly four 3s?

Another case of repeated trials, but here we need to distinguish between the probability of two outcomes: 3 and ~3. Every successful outcome of eight rolls (i.e. every outcome that contains exactly three heads) will contain five "not-3" rolls. Therefore, the probability of each successful outcome is p(3)^3 * P(~3)^5 = (1/6)^3 * (5/6)^5. Thus the probability of getting exactly three heads in eight rolls is: C(8,3) * (1/6)^3 * (5/6)^5.

Which of the following do you think has the higher probability:
1. Getting exactly 5 heads and 5 tails in 10 flips of a fair coin.
2. Getting exactly 50 heads and 50 tails in 100 flips of a fair coin
Although intuitions differ about the answer, if we consider the differences in the size of the outcome spaces in each experiment, and remember that the sum of all the possible outcomes must add to 1, it will (may?) become clear that the case with the smaller number of flips has the higher probability.
Now consider which of the following has higher probability:
1. Getting 45-55% heads in 10 flips of a fair coin
2. Getting 45-55% heads in 100 flips of a fair coin
What makes this comparison different than the first? Is it surprising to find that the case with the larger number of flips has the higher probability here?
Other questions??

Now to the final exam.

p(A|B) = p(A and B) / p(B)

p(E) = p(E|s[1])*p(s[1]) + p(E|s[2])*p(s[2]) + ...... + p(E|s[n])*p(s[n])

p(E) = p(E|s[1])p(s[1]) + p(E|s[2])p(s[2]) + ...... + p(E|s[n])*p(s[n])