The Devil's Excrement

On Mathematical models of the recall vote and fraud, part VIII: The physicists’ chopped up binomial distribution

September 7, 2004

This is a rewrite of last night’s post, Quico wanted it to be clear to 12 year olds, that might be stretching it, but hope it works for 21 year olds:

Isbelia Martin and a group of physicists at Sim�n Bol�var University have been looking at the statistics of the number of votes for each voting machine and by states.

The behaviour of the votes in an electoral process should follow what is called a binomial distribution. A binomial distribution occurs whenever you have two outcomes of a process; the classical example is when you flip a coin. If the coin is fair, half of the time you get heads, half of the time you get tails. You get a distribution when you do an experiment many times, that is, suppose you flip a coin 1oo times and record how many heads or tails you get, but your repeat the experiment 100 times. You then record how many times you got only one head (very unlikely), two, three and so on. At the end you divide the frequency of getting each one of these cases and you get a probability distribution like this one that I stole from this site:

The voting process is similar in that the voters are in theory fairly independent in their decision. In the case of the voting process, the flipping of the coins becomes each voting center, which may have either a total number of voters or a total number of registered voters. So, you could construct a probability distribution, much like the one above, in which you could plot, as a first simple case, the number of people that actually voted. This is a binomial, because the voter decides between two choices whether to go and vote or not. Each voter is assumed to be independent of the other, even though there may be family pressures to go and vote. The main difference between this problem and the coin problem is that the probabilities are not 0.5 for each case. In fact, in the recall vote abstention was approximately 32%, so you could say that the probability of any given person voting was p=0.68 and the person not voting had a probability of 0.32.

What is different in this problem is that you have machines of different size, so what you can count is how many people voted n in each machine of size N, and then plot the frequency of occurrences for each machine of size N. What you get in the case of the voters in the recall referendum is something very similar to the distribution of the coin toss problem, which is expected, since both are binomial processes.

Mathematically, you can calculate the probability for a binomial process that you will get a value of n voters showing up to vote for each machine with N registered voters. Thus, if we have machines with N voters each, the probability that a voter will go vote is p and the probability that it will not go and vote is q=p-1, thus if we have M machines with N voters, the number of voters n that do go and vote, will be between �0 and N� and will follow what is called a binomial distribution given by

P(n)=(N!/n!(N-n)!) p^n x q^(N-n)

This is a bell shaped curve like the one plotted above

Supposed we now plot instead the number of voters that did go and vote (abstention) as a function of the number of voters per machine that were registered at each machine, if the distribution is binomial the points for the abstention should form a cloud of points that open up like the tail of a comet with the greatest density along an imaginary line with a slope proportional to the average attendance of voters in that population. If half the people abstained, this cloud would be along the 22.5 degree line with respect to the horizontal, but since in the case of the recall the percentage of abstention was 32%, this cloud would be below the 22.5 degree line.

Below is a plot of such graph for the number of people n that did not go and vote in all of the centers in Miranda state as a function of the number of voters registered per machine N:

Plot of the number of voters that that abstained as a function of the number of registered voters N at each machine for Miranda state.

This is a textbook type of example of what one should get for a process that should follow a binomial distribution. Thus, the first conclusion is that the data from the recall vote in terms of the choice between going to vote or not behaves in Miranda state and nationally, much like what is expected from a binomial distribution.

The same logic should apply to the SI and NO votes. It should be a binomial distribution since it represents a choice between two possibilities. If the vote split were a perfect 50%/50% for the Si and the No, and one plotted the number of votes n for one or the other possibility as a function of the number of actual voters at each machine N, the cloud would spread below the 45 degree line that divides the plane, along a 22.5 degree imaginary line. In the recall vote, since the No won then, if one plots the dispersion plot for the NO votes on would get a cloud above the 22.5 degree line and a similar one below that line for the corresponding cloud of SI votes which is also plotted below.

However, what is observed is completely different as seen in the next graph for the number of NO votes n, in Miranda state as an example, as a function of the number of voters in each machine N:

Plot of the number of No voters as a function of the number of voters N at each machine

Instead of obtaining a single cloud, one obtains two separate areas of high density with a valley of low density separating them. I have drawn three imaginary lines to guide the idea to the valley (area with low density between the two thicker clouds) as well as imaginary lines along the two separate clouds at each side.

Exactly the same type of behaviour is seen for the number of SI votes n in Miranda state as a function of the number of voters in each voting center N:

Plot of the number of Si voters as a function of the number of voters N at each machine

This shows the same low density valley, where I have drawn a line to guide the eye and two clouds at each side.

Thus, Miranda data, which conforms to a binomial distribution when one looks at the binomial process of abstention versus voting, does not conform to a binomial distribution. In fact, according to the authors, the data for Miranda state would NEVER conform to a binomial distribution. This is the second conclusion: The data for the Si and No votes does not conform to a binomial but is part of the same data that did conform to a binomial in the case of the abstention. In fact it would never conform to it.

Even more interesting, the same type of behaviour has been seen in Zulia, Carabobo, Anzoategui, Tachira and Lara, but �textbook� type of behaviour is found in other states such as Falcon and Vargas. Other smaller states also show classical behaviour. This creates a big problem, how would one explain that some states behave exactly like a binomial, textbooks cases, no discrepancies, while certain selected states do not?

In order to try to understand this unusual behaviour, the authors plotted the histogram of occurrences for the Si and NO votes as shown in the next figure:

Histogram of the occurrences of the Si (red), No (blue) votes as a function of the number of votes.

There are two distributions plotted in this figure: The Si bars are the distribution of occurrences of the number of Si voters for each machine with N voters, in the blue the distribution for the number of NO voters as a function of the number of N voters in the machine. As you can see it is as if the Si votes had had a piece chopped up for machines in which the number of registered voters was above 250 and up to 350. This data is for Miranda state, but if one looks at a similar histogram at the national level, the same type of �chopped up� binomial distribution is observed. This is the third conclusion: The distribution is a binomial that appears to have part of it �chopped up� as if part of the Si votes were shifted to No votes.

It is this same chopping up which accounts for the valleys in the two unusual dispersion curves.

Thus, it would seem as if the process is not at all like a binomial as it should be, but follows instead a distribution which appears to have some form of artificiality and selectively introduced into it, creating two types of distribution. Curiously, the abstention had the proper behaviour expected from a binomial, but is part of a process within the same data. This result is consistent with the hypothesis of Haussman and Rigobon that only a certain number of machines may have been manipulated, in this case, the data suggests if was a selection based on the number of registered voters per machine, which determined whether the data was manipulated or not.

Posted in Uncategorized | Leave a Comment »