The Birthday Problem and Other Chance Encounters
The Birthday Problem is a well-known favorite in Probability because it is easy to understand, yet the answer seems counter-intuitive. It is normally stated as:
How many people, chosen at random, are needed so that the probability that there exists at least one pair who share the same birthday is greater than 50%?
Note a few things: "birthday" means month and day only, not the year. "chosen at random" means the people are chosen with no intentional bias. For example, we would avoid choosing twins since we'd know beforehand they share the same birthday. "at least one pair" leaves open the possibility more than one pair may exist, or three people share a common birthday, or any combination thereof. More importantly, "at least one pair" follows the rules of logical quantifiers, as we will see in a moment. Also, the leap day is not counted, or, if one insists, the leap day February 29 is "paired" with March 1. This does not change the outcome of the problem at all, so whether to include the leap day or ignore it makes no difference.
In a problem like this, it is difficult to account for all possible ways birthdays can be shared, e.g. one pairing, more than one pairing, three or more people sharing a common birthday, and so on. All of these cases fall under the umbrella of "at least one". The logical negation of "at least one" is "no/none", and vice-versa. It is easier to find the probability that no one shares a birthday with any other in the group. What we'll do is find the minimum number of people necessary so that the probability that no one shares a birthday is below 50%. Thus, the negation, "at least one pair shares a birthday", will be above 50%.
The usual method is to brute force a solution. We start with one person. That person has a birthday. A second person is randomly chosen. The probability he or she does not share the same birthday as the first person is 364/365. A third person is randomly chosen. Since the first two people have "used up" two of the possible days, the probability that this third person does not share a birthday with the any of the first two people is 363/365. Thus, the probability that among these three people, there are no shared birthdays is the product (364/365)(363/365), which is about 0.992, or 99.2%. Therefore, the probability that there is a shared birthday among the three people is 1 − 0.992 = 0.08, or 0.8%, slightly less than 1 percent.
The same process continues. Assuming there are no shared birthdays among any pair of the first three people, then a randomly-chosen fourth person has a probability of 362/365 of having a different birthday than any of the first three, and the probability that among these four people, there are no pairs who share a common birthday is the product (364/365)(363/365)(362/365), which is about 0.984, or 98.4%, so that the probability that among these four people, there exists at least one pair who do share a common birthday is 1 − 0.984 = 0.016, or 1.6%.
Now that the structure of the problem is (hopefully) more evident, we can build a generalized approach. The structure is summaried in the table below:
No. of people (n) |
Probability of no shared birthday |
Prob of at least one shared birthday |
2 |
364/365 = 0.997, or 99.7% |
1 − 0.997 = 0.003, or 0.3% |
3 |
(364/365)(363/365) = 0.992, or 99.2% |
1 − 0.992 = 0.008, or 0.8% |
4 |
(364/365)(363/365)(362/365) = 0.984, or 98.4% |
1 − 0.984 = 0.016, or 1.6% |
5 |
(364/365)(363/365)(362/365)(361/365) = 0.973, or 97.3% |
1 − 0.973 = 0.027, or 2.7% |
To generalize this structure, note that for a given value n, the probability of no shared birthdays is
(364/365)(363/365)(362/365) ... .
There is actually a preceding factor in this string: 365/365. This represents the first person. This quotient is "1", meaning that among one person, the probability of no shared birthdays is 1, or certain. That should not be a surprise. However, including it will make the subsequent equation using permutations more sensible. Thus, what we really have is
(365/365)(364/365)(363/365)(362/365) ... .
Therefore, the probability that among n people, there are no shared birthdays is
P(365,n)/365n.
And finally, the probability that there exists at least one pair of people in the group who share a birthday is
1 − P(365,n)/365n.
The problem with this approach is that there is no way to analytically determine n other than to keep plugging in values of n until the second column's value drops below 0.5, so that the third column's value rises above 0.5.
This magical value n is 23. That means, in any group of 23 randonly-chosen people, the probability there exists at least one pair who share a birthday is above 50% (it's actually 0.507, or 50.7%). This seems counter-intuitive, since most people would think we need "a lot" of people in order to have the probability of at least one shared birthday to be above 50%. Below is a spreadsheet showing how often a shared birthday occurs among a group of 23 people. (click to enlarge)
Is there a way to find n without having to just plug in values until the desired answer is found?
We approach this problem from a different angle. There are actually two values we seek: n, the number of people, and k, the number of pairings of two people chosen from this group of n people. These values are related by the combination equation:
C(n,2) = k.
In other words, "n people, chosen 2 at a time, is k".
In a sense, k represents the number of "opportunities" of possible pairings. For example, in a group of 4 people, there are C(4,2) = 6 "opportunities" for a pairing. If the group consists of Al, Bob, Chuck and Dave, then there are 6 pairings: Al & Bob, Al & Chuck, Al & Dave, Bob & Chuck, Bob & Dave, and Chuck & Dave. It's important to note that the number of possible pairings, k, is always much greater than the number of people, n, in the group. For example, in a group of 10 people, there are C(10,2) = 45 possible pairings.
We will find k first, then work backwards to find n. As we did above, we will first find the probability of no pairings with shared birthdays, from which we can then infer, by logical negation, the probability of at least one pairing with a shared birthday.
Among any two people, the probability they do not share a birthday is 364/365. If there are k possible pairings, then we want each pairing to not share a birthday, or the quotient 364/365 multiplied to itself k times, that is, (364/365)k. And we want this value to be slightly below 0.5. The "easy" way to do this is simply to set (364/365)k equal to 0.5, and solve for k using logarithms:
(364/365)k = 0.5, so that k = ln(0.5)/ln(364/365) = 252.65.
Since k must be an integer, then k is either 252 or 253. Note that (364/365)252 = 0.5009 and that (364/365)253 = 0.4995. Since we wanted (364/365)k to be below 0.5, then k = 253 is the desired number of pairings for which there are no shared birthdays that we seek.
Now we find n. We use the equation C(n,2) = k. In general, C(n,2) = n(n − 1)/2. We set this equal to 253:
n(n − 1)/2 = 253.
Multiplying by 2 gives
n(n − 1) = 506.
Distributing to clear parentheses and combining all terms to one side gives
n2 − n − 506 = 0.
This is quadratic, and factoring gives
(n + 22)(n − 23) = 0
Thus, n = −22 or n = 23. The negative value is not considered. Thus, n = 23 randomly-chosen people is the minimum number of people necessary so that the probability at least one pair share a birthday is above 50%.
We got a little "lucky" in that this quadratic equation factored cleanly into two smaller factors with integer solutions. This doesn't always happen, as the next example shows.
Suppose we want to find the minimum number of people needed so that the probability of at least one pair sharing a birthday is 90%. We follow the same logic: we first seek the number of pairings, k, so that (364/365)k = 0.1, where 0.1 represents 10%, the probability that no pairing share a common birthday. This k is
(364/365)k = 0.1, so that k = ln(0.1)/ln(364/365) = 839.29.
Thus, k = 839 or 840. We choose the larger of the two, so k = 840.
Now, set C(n,2) = 840. We skip to the quadratic step:
n(n − 1)/2 = 840, so that n(n − 1) = 1680, which gives n2 − n − 1680 = 0.
Using the quadratic formula, n = [−(−1) ± √((−1)2 − 4(1)(−1680))]/2, which gives n = 41.49 and n = −40.49 as roots. The negative root is ignored. The problem is that n = 41.49 people does not make sense. It must be either 41 people, or 42 people.
Using the permutation notation (from above) to check n = 41, we get P(365,41)/36541 = 0.097, and when n = 42, we get P(365,42)/36542 = 0.086. In this case, we want P(365,n)/365n to be below 0.1 "for the first time", and this happens when n = 41. As a check, when n = 40, P(365,40)/36540 = 0.109, which is above 0.1.
Thus, in any group of 41 randomly-chosen people, the probability there exists at least one pair of people with a shared birthday is over 90%. Does this answer surprise you?
Below is a table in Excel that shows the probability of no shared birthdays, then the probability of at least one shared birthday, for a given number of people. At 23 people, the probability of a shared birthday is 50.7%. Interestingly, at 27 people, the probability rises above 60%, at 30 people the probability rises above 70%, at 35 people it hits 80% and at 41 people, it hits 90% as we just noted above. At 68 people, the probability of a shared birthday is nearly 99.9%. (click to enlarge)
A few closing comments:
- In a group of 365 people, the probability of at least one shared birthday is still just a hair below certain, as it is possible each of the 365 people have a different birthday from everyone else. But at 366 people, the probability of at least one shared birthday is certain. This should be intuitive. If, for some crazy reason, the first 365 people all had different birthdays, then the 366th person must have a birthday on a day already "taken" by one of the first 365 people. This is called the "pigeonhole principle".
- Sometimes, this problem gets mis-understood due to how it is stated. often in a way that is completely different from how it's supposed to be stated. For example, it is not a question of how many people are needed so that someone shares your birthday. That's one common erroneous interpretation.
- This problem can explain why you sometimes have chance encounters that seem so "unlikely" but are actually likelier than one would think. Given enough people and enough "opportunities" for a shared characteristic (e.g. visiting the same store, being at the same concert, etc), it's bound to happen rather often, as opposed to "almost never".
|