Foundations of Mathematics
Summer Program , Columbia University
July 1999 -- Week #4 Notes
Roger B. Blumberg
From Counting to Probabilities
Last week, we learned about permutations and combinations and then tried to generalize the "counting" or "subset making" approach to as many different kinds of problems as we could. Here's another use for the binomial coefficient:
Finally, although our discussion of Pascal's Triangle showed how the mathematics of counting can help us identify patterns, you might wonder how the mathematics of combinations can help us with proofs. Here's one example: Prove that the product of any five consecutive integers is divisible by 5! While we could of course prove this by induction, it may be easier to prove it directly:
Take a moment to digest that, think about how we could generalize this to show that the product of any k consecutive integers will be divisible by k!, and then we'll move on to the transition from counting to calculating probabilities.
1. Suppose you are dealt 5 cards from a fair deck of 52. How many different ways are there to get a straight? What is the probability of getting a straight?
The difference between the answer to the first question (C(9,1)*[C(4,1)^5]) and the second question will just be that in the second case we put the answer over C(52,5), which gives the total number of 5 card hands that are possible. This leads to the basic definition of the probability of a discrete event, E,:
2. Suppose that two students are chosen from this class at random. How many different ways might one chose two female students? If the two students are chosen at random, what is the probability that both students are male? That exactly one is male and one is female?
Here again we use binomial coefficients to calculate both the number of successful cases and the total number of possible cases. Thus, if we have a class with 12 females and 8 males, the probability that we have chosen two male students is: C(8,2)/C(20,2).
A. The Concepts of Discrete Probability
We begin by distinguishing elementary, or simple, events from complex events. For example, if we roll a fair die the event "roll a 4" is simple, while the event "roll an odd number" is complex (i.e. made up of more than one simple events). Now, in addition to the basic definition of probability above we have the following fundamental ideas:
Conceptualizing the outcome space is perhaps the most important step in calculating (discrete) probabilities. Consider the calculation of p(prime) in the roll of one die; p(sum=10) in the roll of two dice; p(3H and 2T) in five flips of a coin.
Suppose we reconsider the examples if the dice were loaded so that p(i) = i/21.
This may seem trivial, but once we realize that it implies that p(E) = 1 - p(~E) it is extremely powerful.
Example 1: Suppose you roll four fair dice. What is the probability that the sum is less than 24? Once we realize that p(sum<24) + p(sum=24) = 1, the answer is easily computed.
Example 2: Suppose you have ten addressed envelopes and ten personalized invitations, and the invitations are sorted into the envelopes at random; what is the probability that at least one invitation is in the wrong envelope?
Suppose two fair dice are rolled: What is the probability their sum is prime or that you've rolled doubles?
Suppose you roll a die and flip a coin. What is the probability you rolled a 5 and flipped a Tail? Notice how the definition captures the size of the outcome space in the denominator.
Which of the following strings of coin flips is most likely if the coin is fair?:
TTTTTTTTTTT
HTHTHTHTHT
Then why, given ten flips of a fair coin, do we think 5H & 5T any more likely than 10H? Given 12 rolls of a fair die, why is 2 of each outcome more likely than 12 fours?
We realize that, for repeated trials, the probability of a complex event will have to take into account both the number of ways the event can occur and the probability of each instance of the event as follows:
We modify the binomial formula to accommodate this insight, by realizing that now each of the "bins" or "subsets" can be represented as having a probability attached to it. Thus, for n repeated flips of a coin, the probability of getting r heads is:
We can also modify the multinomial formula, and find that for n repeated rolls of a die, the probability of getting exactly: r(1) 1s; r(2) 2s; r(3) 3s; r(4) 4s; r(5) 5s; and r(6) 6s is:
For example, suppose we go back to the insects we were talking about at the start of the unit on counting. Suppose we have 20 ants, 30 bees and 50 roaches all in one box, and we choose 10 insects at random (all at once). What is the probability we choose 2 ants, 3 bees, and 5 roaches? Before calculating, do you think the probability greater than .5? Do you think this outcome the most likely to occur? How can you reconcile the small probability with the fact that it is the most likely outcome?
We now have a mathematical theory of probability which incorporates our theories about counting.
C. Conditional Probability and Bayes Theorem
Recognizing the importance of the outcome space in all of our calculations, it is easy to understand why having information restricts the outcome space allows us to make more accurate probability judgements. Consider how your answers would differ in the following cases:
Clearly the outcome space is more restricted in the second case, and thus our answer in the second case is different than in the first. (Aside: can you think of a piece of information that would have made us judge the possibility lower in the second case?)
We can use this example to derive a formula for conditional probability:
Finally, we can build on this idea and come up with a method for dealing with cases in which a particular event can occur in any of several subsets. For example, consider a high school population in which some of the members of each grade are smokers. Suppose we want to figure out the probability of choosing a smoker if we randomly choose a student from the school.
Bayes Theorem states that if a finite outcome space is completely divided into disjoint subsets (e.g. s[1], s[2], s[3], ..., s[n]), then for any event, E, in the outcome space:
A. In order to review the basic of probability from last time, and the way the counting methods we studied last week are integrated into combinatorial probability we look at four problems:
B. In preparation for the final exam last year, I solicited questions from students. Here are some of the kinds of questions that were asked.
Unlike the permutation problems involving repeated letters, we cannot assume that every particular combination of three letters will be duplicated the same number of times. For example, C(13,3) would not count the combination MCH more than once, but would count MCT twice, and would count MSC four times. Therefore, we have two choices: 1) calculate C(13,3) and subtract the number of combinations we've counted more than once; or 2) realize that, since there are only 7 different letters in MASSACHUSETTS, we can calculate C(7,3) and then add on all those combinations we left out. If we choose the second approach, we see that the combinations we've left out are just those with two As (there are 6 of these), two Ss (there are 6 of these too), two Ts (again, there are 6), and the single combination of SSS.
I. Since the case of n=1 doesn't make much sense (what is C(1,2)?), we consider n=2. We get C(4,2) = 2 * C(2,2) + 2^2 = 2 * 1 + 4 = 6, which equals C(4,2).
II. We want to show that
(2n+2)!/(2n!*2!) - [2n!/((2n-2)!*2!)] = (n+1)^2 - n^2 + 2(C(n+1,2) - C(n,2))
(2n+2)!/(2n!*2!) - [2n!/((2n-2)!*2!)] = 2n +1 + 2(C(n+1,2) - C(n,2))
Finding common denominators and expanding the left side, and rewriting the combination of (n+1) elements on the right side, in terms of the combination of n elements (recalling the equality from Pascal's Triangle), we get:
(4n^2 + 6n + 2)/ 2 - [(4n^2 - 2n)/2] = 2n+1+2(n)
(8n + 2)/2 = 2n + 1 + 2n = 4n+1
This is a clear example of repeated trials, and so the probability must take into account both the number of ways we can get 6 heads in 8 flips, and the probability of each way. We therefore calculate p(6H and 2T) = C(8,6) * (1/2)^8. Of course, we could have used C(8,2) in this calculation as well.
Another case of repeated trials, but here we need to distinguish between the probability of two outcomes: 3 and ~3. Every successful outcome of eight rolls (i.e. every outcome that contains exactly three heads) will contain five "not-3" rolls. Therefore, the probability of each successful outcome is p(3)^3 * P(~3)^5 = (1/6)^3 * (5/6)^5. Thus the probability of getting exactly three heads in eight rolls is: C(8,3) * (1/6)^3 * (5/6)^5.
Although intuitions differ about the answer, if we consider the differences in the size of the outcome spaces in each experiment, and remember that the sum of all the possible outcomes must add to 1, it will (may?) become clear that the case with the smaller number of flips has the higher probability.
Now consider which of the following has higher probability:
What makes this comparison different than the first? Is it surprising to find that the case with the larger number of flips has the higher probability here?
Now to the final exam.