I Smell a Rat

By Colin Shea
FreezerBox.com

I smell a rat. It has that distinctive and all-too-familiar odor of the species Republicanus floridius. We got a nasty bite from this pest four years ago and never quite recovered. Symptoms of a long-term infection are becoming distressingly apparent.

The first sign of the rat was on election night. The jubilation of early exit polling had given way to rising anxiety as states fell one by one to the Red Tide. It was getting late in the smoky cellar of a Prague sports bar where a crowd of expats had gathered. We had been hoping to go home to bed early, confident of victory. Those hopes had evaporated in a flurry of early precinct reports from Florida and Ohio.

By 3 AM, conversation had died and we were grimly sipping beers and watching as those two key states seemed to be slipping further and further to crimson. Suddenly, a friend who had left two hours earlier rushed in and handed us a printout.
"Zogby's calling it for Kerry." He smacked the sheet decisively. "Definitely. He's got both Florida and Ohio in the Kerry column. Kerry only needs one." Satisfied, we went to bed, confident we would wake with the world a better place. Victory was at hand.

The morning told a different story, of course. No Florida victory for Kerry - Bush had a decisive margin of nearly 400,000 votes. Ohio was not even close enough for Kerry to demand that all the votes be counted. The pollsters had been dead wrong, Bush had four more years and a powerful mandate. Onward Christian soldiers - next stop, Tehran.

Lies, Damn Lies, and Statistics
I work with statistics and polling data every day. Something rubbed me the wrong way. I checked the exit polls for Florida - all wrong. CNN's results indicated a Kerry win: turnout matched voter registration, and independents had broken 59% to 41% for Kerry.

Polling is an imprecise science. Yet its very imprecision is itself quantifiable and follows regular patterns. Differences between actual results and those expected from polling data must be explainable by identifiable factors if the polling sample is robust enough. With almost 3.000 respondents in Florida alone, the CNN poll sample was pretty robust.

The first signs of the rat were identified by Kathy Dopp, who conducted a simple analysis of voter registrations by party in Florida and compared them to presidential vote results. Basically she multiplied the total votes cast in a county by the percentage of voters registered Republican: this gave an expected Republican vote. She then compared this to the actual result.

Her analysis is startling. Certain counties voted for Bush far in excess of what one would expect based on the share of Republican registrations in that county. They key phrase is "certain counties" - there is extraordinary variance between individual counties. Most counties fall more or less in line with what one would expect based on the share of Republican registrations, but some differ wildly.

How to explain this incredible variance? Dopp found one over-riding factor: whether the county used electronic touch-screen voting, or paper ballots which were optically scanned into a computer. All of those with touch-screen voting had results relatively in line with her expected results, while all of those with extreme variance were in counties with optical scanning.
The intimation, clearly, is fraud. Ballots are scanned; results are fed into precinct computers; these are sent to a county-wide database, whose results are fed into the statewide electoral totals. At any point after physical ballots become databases, the system is vulnerable to external hackers.

It seemed too easy, and Dopp's method seemed simplistic. I re-ran the results using CNN's exit polling data. In each county, I took the number of registrations and assigned correctional factors based on the CNN poll to predict turnout among Republicans, Democrats, and independents. I then used the vote shares from the polls to predict a likely number of Republican votes per county. I compared this 'expected' Republican vote to the actual Republican vote.

The results are shocking. Overall, Bush received 2% fewer votes in counties with electronic touch-screen voting than expected. In counties with optical scanning, he received 16% more. This 16% would not be strange if it were spread across counties more or less evenly. It is not. In 11 different counties, the 'actual' Bush vote was at least twice higher than the expected vote. 13 counties had Bush vote tallies 50 - 100% higher than expected. In one county where 88% of voters are registered Democrats, Bush got nearly two thirds of the vote - three times more than predicted by my model.

Again, polling can be wrong. It is difficult to believe it can be that wrong. Fortunately, however, we can test how wrong it would have to be to give the 'actual' result.

I tested two alternative scenarios to see how wrong CNN would have to have been to explain the election result. In the first, I assumed they had been wildly off the mark in the turnout figures - i.e. far more Republicans and independents had come out than Democrats. In the second I assumed the voting shares were completely wrong, and that the Republicans had been able to