The Randomness of Polls
One of the more poorly understood things about polls is how random they really are. For example, the obvious purpose of polling is to estimate as closely as possible what the outcome of the vote is going to be. But suppose we did know the outcome of the vote, because it was the day after the election. Well, then we could look back at the prior day or two's worth of polls and judge them by how well they stacked up against the actual vote.
I set up a simulation of, for example, a Pennsylvania poll with 726 voters. It is assumed for the simulation that if you could poll every likely voter in Pennsylvania the result would be Kerry 46%, Bush 44, 2% others and 8% (of the likely voters) not voting. Then I created a poll of 726 likely voters (the number used in a recent Quinnipiac Poll) and via random numbers compared to those percentages, allocated each voter to a candidate, and compared their polling percentages. Then I ran the simulation 50 times. The results may seem a bit surprising. Kerry's polling varied from a low of 41.6% in year 33 of the simulation to 50.4% in year 39, while the President's numbers varied from 40.1% in year 39 to a high of 48.5 in year 29. In fully 1/4 of the polls the President was leading, when, based on the assumption, Kerry should be leading by 2%.
The polling becomes more accurate in the aggregate; after 50 runs thorugh the 726 likely voters, the overall average was 46.02 for Kerry, 43.98 for Bush. Of course, the problem is that nobody wants to actually poll some 36,000 people, so they use the smaller numbers, which are subject to wild fluctuations. For example, suppose one poll had shown Kerry ahead by a full 10 points, and then the next week showed Kerry losing by almost 7 points; there would be a strong presumption that the campaign was falling apart, and every mistake that Kerry had made would be subjected to scrutiny in search of the cause, and yet it is completely within the realm of possibility that no movement actually occured in the underlying public opinion; both those results came out in the 50 runs I made, when the assumed public opinion was Kerry by two.