Thursday, March 30, 2006

A Little Lesson in Polling.

  1. The Game

  2. The Approach

  3. The Results

  4. Assumptions and Samples

  5. An Experiment on Randomness

  6. Margin of Error

  7. Another President

The Game

DJB Rizalist at Philippine Commentary poses the following game at his blog:
IMAGINE that you are a contestant in a Game Show whose prize is a cool $1 million.

An Olympic-sized swimming pool is filled with a pre-arranged mixture of regulation billiard balls (colored and numbered 1 to 15) and white cue balls. The object of the Game is to guess how many white cue balls there are. It has been announced that the TOTAL NUMBER of balls whether white or colored is exactly 100 million.

Contestants are given exactly one hour during which they can perform any physical, mental or statistical inspections on the collection of billiard balls. They are allowed pencil, paper and a computer with an Internet connection. Rulers, clocks, weighing scales, cameras, cell phones are also available. The winner is the contestant who comes closest to the correct answer in one hour.
Source: Philippine Commentary - Guess the Fraction of White Cue Balls
In the commentary he did some number crunching and at the rate of counting 1 ball/sec it would take a person around 3.5 years to manually count the lot, he only gives you an hour. This ladies and gentlemen is what polling is all about.
Back to the Contents

The Approach

I suggested the following approach (the rules allow you access to commonly available items, so I choose, a bucket, a tape measure, paper, pencil, and a calculator). Calculate (or somehow obtain) the volume of the pool (I assume the pool dimensions are available), calculate the volume of the bucket, then walk around the pool choosing spots at random dig out a bucketfull of balls, count the number of balls and the number of cue balls in each bucket and record. Now, dig out samples at random spots throughout the pool (obviously we only have access to the surface) and repeat the process of digging out balls and counting.

Now, perhaps 20 minutes or so before the deadline stop sampling the balls (or whatever would be sufficient time) and start figuring out what the average number of cue balls found in a bucket (that's bouquet!) would be. Take that average number and then multiply by (volume of pool/volume of bucket) and you should come close to finding the number of cue balls in the pool.
Back to the Contents

The Results

DJB then sets up a scenario where I count 1200 cue balls out of 3600 total sampled. This then indicates about 1/3 of the balls are cue balls so for 100 million total balls one could reasonably believe there would be 33,333,333 cue balls in the total population of balls. He goes further and assumes a normal approximation to a binomial distribution (terms which had specific meanings to me at one time, but now are only generally understood by myself) and calculates a margin of error of about 1.67%. That is my guess is within 556,667 cue balls of being right (ie the real answer is at least 32,776,700 and at most 33,890,000).
Back to the Contents

Assumptions and Samples

One assumption we have to make is cue balls are somewhat uniformly scattered throughout the depth of the pool. Our sample only comes from the top layer of balls, so if the balls are more highly concentrated at the top or bottom of the pool our results will be off. This is like going to Berkeley CA in early 2004 and surveying who they wanted for President and then claiming the whole country's preference based on that survey. Of course it would be wrong. Now, if that same survey hit up the folks in Armarillo Texas as well as Berkeley then it would become more accurate. This is one way to fix a poll, restrict your sample to a population that you know will give you the answer you want.

A poll or survey is only valid for the population from which the sample was taken. If the sample is taken only from Berkeley California then the poll is only valid for Berkeley California. If the sample is taken from random spots of America then it can be said to be valid to the American populations.
Back to the Contents

An Experiment on Randomness

Try this experiment, to see how randomness can even things out. Go to the grocery store, as you place items into the cart, round off the prices of the items to the nearest quarter ($.25, 25¢) and keep a running total in your head. Then go and pay, you can be assured your total (assuming your addition is all correct) will be very close to the actual total. Why? Because chances are you will be rounding down as often as you round up and they cancel each other. Random sampling provided your sample is large enough also performs the same function, that is high non-representative concentrations of an outcome will be negated by low non-representative concentrations of an outcome.
Back to the Contents

Margin of Error

Margin of error is a widely misunderstood number. Margin of error is a mathematical measure of the uncertainty in a given number. One way to decrease the margin of error is to increase the sample size. In the case above other sources of error may creep into the final result such as uncertainties in the volume of the pool and the bucket, but in most polls such uncertainties are not present.

Margin of error refers to one single poll. Let us return to the pool example of above. The margin of error is 1.6% that is the final answer is likely to lay within +/-1.6% of the answer. The emphasis on likely is important, because in truth, the final answer could lay outside of that range as well. However, chances are the answer is in the range indicated.

An interesting conclusion I came to in 2004 prior to the USA Presidential election was the polls showing George W. Bush in the lead. Those polls consistently had George W. Bush in the lead, but the margin of error brought Kerry's numbers and W's numbers into overlap, that is a straight margin of error analysis may have led a person to believe one could not tell who really was in the lead. Well, one thing I noticed as more and more polls came out – W's numbers were consistently better than Kerry's and furthermore the numbers varied from poll to poll by very little (watching each poll on its own, i.e. from Gallup to Gallup, and Zogby to Zogby etc, not Gallup to Zogby). I came to conclude W would win by the margins the polls indicated and this is what happened.
Back to the Contents

Another President

Mr. Rizalist presents the puzzle in an environment where people are being polled as to their views of the Philippino President Gloria Macapagal Arroyo. I don't know what her numbers are exactly but most surveys on the matter I have seen indicate most Philippinos want her to resign. DJB does a good thing educating his readers on the mechanics of statistical sampling.
Back to the Contents