Archive for September, 2004

Conned in Caracas from the WSJ

September 9, 2004

An opinion piece on the fraud in the Wall Street Journal:


CONNED IN CARACAS


 

New evidence that Jimmy Carter got fooled in Venezuela.

Thursday, September 9, 2004 12:01 a.m. EDT

Both the Bush Administration and former President Jimmy Carter were quick to bless the results of last month’s Venezuelan recall vote, but it now looks like they were had. A statistical analysis by a pair of economists suggests that the random-sample “audit” results that the Americans trusted weren’t random at all.

This is no small matter. The imprimatur of Mr. Carter and his Carter Center election observers is being used by Venezuelan President Hugo Chavez to claim a mandate. The anti-American strongman has been steering his country toward dictatorship and is stirring up trouble throughout Latin America. If the recall election wasn’t fair, why would Americans want to endorse it?

The new study was released this week by economists Ricardo Hausmann of Harvard and Roberto Rigobon of MIT. They zeroed in on a key problem with the August 18 vote audit that was run by the government’s electoral council (CNE): In choosing which polling stations would be audited, the CNE refused to use the random number generator recommended by the Carter Center. Instead, the CNE insisted on its own program, run on its own computer. Mr. Carter’s team acquiesced, and Messrs. Hausmann and Rigobon conclude that, in controlling this software, the government had the means to cheat.

“This result opens the possibility that the fraud was committed only in a subset of the 4,580 automated centers, say 3,000, and that the audit was successful because it directed the search to the 1,580 unaltered centers. That is why it was so important not to use the Carter Center number generator. If this was the case, Carter could never have figured it out.”


<!– D(["mb","Mr. Hausmann told us that he and Mr. Rigoban also "found very clear trails of fraud in the statistical record" and a probability of less than 1% that the anomalies observed could be pure chance. To put it another way, they think the chance is 99% that there was electoral fraud. \

\

\ \

\

The authors also suggest that the fraud was centralized. Voting machines were supposed to print tallies before communicating by Internet with the CNE center. But the CNE changed that rule, arranging to have totals sent to the center first and only later printing tally sheets. This increases the potential for fraud because the Smartmatic voting machines suddenly had two-way communication capacity that they weren\’t supposed to have. The economists say this means the CNE center could have sent messages back to polling stations to alter the totals. \

None of this would matter if the auditing process had been open to scrutiny by the Carter observers. But as the economists point out: "After an arduous negotiation, the Electoral Council allowed the OAS [Organization of American States] and the Carter Center to observe all aspects of the election process except for the central computer hub, a place where they also prohibited the presence of any witnesses from the opposition. At the time, this appeared to be an insignificant detail. Now it looks much more meaningful." \

Yes, it does. It would seem that Colin Powell and the Carter Center have some explaining to do. The last thing either would want is for Latins to think that the U.S is now apologizing for governments that steal elections. Back when he was President, Mr. Carter once famously noted that the Afghanistan invasion had finally caused him to see the truth about Leonid Brezhnev. A similar revelation would seem to be in order toward Mr. Chavez. \

\

\

\

\

\

\

\

\

\\
\
\

\r\n\r\n”,1] ); //–>
Mr. Hausmann told us that he and Mr. Rigoban also “found very clear trails of fraud in the statistical record” and a probability of less than 1% that the anomalies observed could be pure chance. To put it another way, they think the chance is 99% that there was electoral fraud.



The authors also suggest that the fraud was centralized. Voting machines were supposed to print tallies before communicating by Internet with the CNE center. But the CNE changed that rule, arranging to have totals sent to the center first and only later printing tally sheets. This increases the potential for fraud because the Smartmatic voting machines suddenly had two-way communication capacity that they weren’t supposed to have. The economists say this means the CNE center could have sent messages back to polling stations to alter the totals.

None of this would matter if the auditing process had been open to scrutiny by the Carter observers. But as the economists point out: “After an arduous negotiation, the Electoral Council allowed the OAS [Organization of American States] and the Carter Center to observe all aspects of the election process except for the central computer hub, a place where they also prohibited the presence of any witnesses from the opposition. At the time, this appeared to be an insignificant detail. Now it looks much more meaningful.”

Yes, it does. It would seem that Colin Powell and the Carter Center have some explaining to do. The last thing either would want is for Latins to think that the U.S is now apologizing for governments that steal elections. Back when he was President, Mr. Carter once famously noted that the Afghanistan invasion had finally caused him to see the truth about Leonid Brezhnev. A similar revelation would seem to be in order toward Mr. Chavez.

Opposition presents its preliminay report on fraud

September 9, 2004

The Coordinadora Democratica presented its preliminary report today on the case for fraud. The report is basically divided into three broad parts:


1)      How the Chavez Government controlled institutions and used them to its advantage from 1998 on, including controlling the Electoral powers and the manipulation of processes in the judicial system.


 


2)      The drive to provide foreigners with ID’s as a contribution to fraud in the electoral process


 


3)      Mathematical studies and evidence obtained from the transmission of data that was in violation of the regulations and suggests some form of manipulation may have taken place.


 


4)      Control of the process and the role of observers.


 


Part 1) of the document may be of interest to someone that became aware of the Venezuelan political crisis only recently. But in some respects it is what this blog has been covering for the last two years. Thus, I will highlight only what the report says in the other three broad sections.


 


-The drive to add voters


 


The report describes how the Government began on April 9 2004 a drive to provide ID cards to people over 18. The Government created special offices run by MVR party members and not by the ONIDEX which is in charge of this. They used a system without any type of security and people were given the ID cards without verification of data. The process had no controls and all the people who were giving ID’s were automatically registered to vote.


 


This process led to 1.8 million people being given ID’s in only four months, without controls, supervision and in indiscriminate fashion and without any of the Government offices in charge of these processes being involved in it. A large part of these new registered voters were in rural regions, with voters registered in centers where voting was manual.


 


In twenty of the twenty four states the electoral registry had a percentage higher than the historical value of 48% of the population, with only four states having less than 50% and the electoral registry increasing from 48% to 54% of the population in six months.


 


By now, some may be thinking or saying this is just democracy at work and what the Chavez Government did was simply to add people to the electoral rolls. However, there is now a general phenomenon that in towns, cities and municipalities there are now more voters than inhabitants. This strange behavior is concentrated in locations where the vote was manual instead of electronic.


 


Mathematics and Communications


 


The report cites work done at the level of parishes in which statistical analysis shows anomalies in the final results at the local level. According to this study, that I have not seen in this detail, with 99% confidence it was demonstrated that 26% of the voting machines had statistical values outside of what would be expected from the point of view of the distribution at the parish level. The report also includes the Hausmann, Rigobon study as part of its evidence.


 


What I found most interesting were the parts relating to communications. Most of this I knew little pieces of, but I had not seen such an overview of it:


 


Bidirectionality. CANTV records demonstrate that there was two-way communications between the voting machines and the CNE servers. Recall that the President of Smartmatic stated after the recall vote that such communication was impossible.


 


Types of communications: The report argues that given the nature of the process the type of communications between each voting center and the CNE should be very similar. However, the study revealed three types of endings to the communications: i) Those terminated by the voting machines i)) Those terminated by the server and iii) Those that appeared to arise from a loss of carrier. All three types occur with similar frequency which has no explanation.


 


Traffic Patterns: If all of the machines were transmitting just the results, the volumes of data transmitted by each machine should be similar. This is not the case; there are wide differences in the traffic patterns from different machines and voting centers. This has no plausible explanation.


 


Transmissions out of schedule: There were connections all day beginning at 7AM, the agreement was that there would only be communications after the machines had printed their results. The regulations state that this should be the case.


 


-Control of the process and observers


 


The report is highly critical of the Carter Center, it says these results in part from the superficiality of the Center which acted as if this was a normal voting process and not that of a country with the conflicts it has had in the last few years.


 


The report criticizes the fact that the Carter Center at no time criticized, like the OAS did, the fact that the Government had maintained control over the whole Electoral process with its majority at the Electoral Board.


 


The report repeats what we already know of the hot audit that was supposed to include 199 machines selected at random by a program designed by the CNE. Of these only 27 were audited in the presence of the opposition and in those 27 the SI won 63% to 37%.


 


The report also relates how it was impossible for anyone to enter the totalization room at the CNE. Not even the OAS and the Carter Center were allowed in even though it had been agreed that they could.


 


The report questions why when the opposition was trying to find acceptable terms to the cold audit performed after the RR, they were told that the observers had already “selected” on a procedure with the CNE which would be a random selection of boxes. In a last minute attempt to convince the opposition to accept the results of the audit,  the Carter Center assured the opposition that the program to select the boxes would be under the control of the Carter Center.


 


And it is here that the report is most critical of the Carter Center, calling it inefficient, superficial and even irresponsible. The report says that the Carter Center program was an Excel program which was not used due to “technical reasons” (!!!!)Thus, the same program questioned on Sunday by the opposition and made by the CNE was used. The report is also critical of something that I pointed out here, in that the Carter Center makes a lot of emphasis on the fact that its representative were next to the ballot boxes all the time, but fail to point out that over 60 hours went by between August 15th. and August 18th. when the audit was performed, when the boxes were alone.


 


The report also says that the boxes from Lara state and Bolivar sate took a large number of hours to arrive and despite the fact that there were assurances that all boxes from Caracas were at the Tuna fort in the Southwest of the city, this turned out not be the case. .


 


Conclusions:


 


The report concludes by suggesting that the process be legally challenged, that the Electoral registry be legally challenged, that the Smartmatic system be questioned and that the next election be done with manual counting for the whole country.


 


In an interesting conclusion, the report suggests that the Coordinadora ask the US Government for the application of the Foreign Corrupt Practices Act against Smartmatic and Verizon, the controlling shareholder of CANTV.  This is an interesting twist as it will require that the evidence be presented and the case be tried in US Courts, far from the control of the Chavez Government.

On Mathematical models of the recall vote and fraud, part VIII: The physicists’ chopped up binomial distribution

September 7, 2004


 


This is a rewrite of last night’s post, Quico wanted it to be clear to 12 year olds, that might be stretching it, but hope it works for 21 year olds:


 


Isbelia Martin and a group of physicists at Simón Bolívar University have been looking at the statistics of the number of votes for each voting machine and by states.


 


The behaviour of the votes in an electoral process should follow what is called a binomial distribution.  A binomial distribution occurs whenever you have two outcomes of a process; the classical example is when you flip a coin. If the coin is fair, half of the time you get heads, half of the time you get tails. You get a distribution when you do an experiment many times, that is, suppose you flip a coin 1oo times and record how many heads or tails you get, but your repeat the experiment 100 times. You then record how many times you got only one head (very unlikely), two, three and so on. At the end you divide the frequency of getting each one of these cases and you get a probability distribution like this one that I stole from this site:


 



 


 


The voting process is similar in that the voters are in theory fairly independent in their decision. In the case of the voting process, the flipping of the coins becomes each voting center, which may have either a total number of voters or a total number of registered voters. So, you could construct a probability distribution, much like the one above, in which you could plot, as a first simple case, the number of people that actually voted. This is a binomial, because the voter decides between two choices whether to go and vote or not. Each voter is assumed to be independent of the other, even though there may be family pressures to go and vote. The main difference between this problem and the coin problem is that the probabilities are not 0.5 for each case. In fact, in the recall vote abstention was approximately 32%, so you could say that the probability of any given person voting was p=0.68 and the person not voting had a probability of 0.32.


 


What is different in this problem is that you have machines of different size, so what you can count is how many people voted n in each machine of size N, and then plot the frequency of occurrences for each machine of size N. What you get in the case of the voters in the recall referendum is something very similar to the distribution of the coin toss problem, which is expected, since both are binomial processes.


 


Mathematically, you can calculate the probability for a binomial process that you will get a value of n voters showing up to vote for each machine with N registered voters. Thus, if we have machines with N voters each, the probability that a voter will go vote is p and the probability that it will not go and vote is q=p-1, thus if we have M machines with N voters, the number of voters n that do go and vote, will be between “0 and N and will follow what is called a binomial distribution given by


 


P(n)=(N!/n!(N-n)!) p^n x q^(N-n)


 


This is a bell shaped curve like the one plotted above


 


Supposed we now plot instead the number of voters that did go and vote (abstention) as a function of the number of voters per machine that were registered at each machine, if the distribution is binomial the points for the abstention should form a cloud of points that open up like the tail of a comet with the greatest density along an imaginary line with a slope proportional to the average attendance of voters in that population. If half the people abstained, this cloud would be along the 22.5 degree line with respect to the horizontal, but since in the case of the recall the percentage of abstention was 32%, this cloud would be below the 22.5 degree line.


 


Below is a plot of such graph for the number of people n that did not go and vote in all of the centers in Miranda state as a function of the number of voters registered per machine N:


 



 


 


 


Plot of the number of voters that that abstained as a function of the number of registered voters N at each machine for Miranda state.


 


This is a textbook type of example of what one should get for a process that should follow a binomial distribution. Thus, the first conclusion is that the data from the recall vote in terms of the choice between going to vote or not behaves in Miranda state and nationally, much like what is expected from a binomial distribution.


 


The same logic should apply to the SI and NO votes. It should be a binomial distribution since it represents a choice between two possibilities. If the vote split were a perfect 50%/50% for the Si and the No, and one plotted the number of votes n for one or the other possibility as a function of the number of actual voters at each machine N, the cloud would spread below the 45 degree line that divides the plane, along a 22.5 degree imaginary line. In the recall vote, since the No won then, if one plots the dispersion plot for the NO votes on would get a cloud above the 22.5 degree line and a similar one below that line for the corresponding cloud of SI votes which is also plotted below.


 


However, what is observed is completely different as seen in the next graph for the number of NO votes n, in Miranda state as an example, as a function of the number of voters in each machine N:


 



 


 


 


Plot of the number of  No voters as a function of the number of voters N at each machine


 


 


Instead of obtaining a single cloud, one obtains two separate areas of high density with a valley of low density separating them. I have drawn three imaginary lines to guide the idea to the valley (area with low density between the two thicker clouds) as well as imaginary lines along the two separate clouds at each side.


 


Exactly the same type of behaviour is seen for the number of SI votes n in Miranda state as a function of the number of voters in each voting center N:


 



 


 


 


Plot of the number of Si voters as a function of the number of voters N at each machine


 


 


This shows the same low density valley, where I have drawn a line to guide the eye and two clouds at each side.


 


Thus, Miranda data, which conforms to a binomial distribution when one looks at the binomial process of abstention versus voting, does not conform to a binomial distribution. In fact, according to the authors, the data for Miranda state would NEVER conform to a binomial distribution. This is the second conclusion: The data for the Si and No votes does not conform to a binomial but is part of the same data that did conform to a binomial in the case of the abstention. In fact it would never conform to it.


 


Even more interesting, the same type of behaviour has been seen in Zulia, Carabobo, Anzoategui, Tachira and  Lara, but “textbook” type of behaviour is found in other states such as Falcon and Vargas. Other smaller states also show classical behaviour. This creates a big problem, how would one explain that some states behave exactly like a binomial, textbooks cases, no discrepancies, while certain selected states do not?


 


In order to try to understand this unusual behaviour, the authors plotted the histogram of occurrences for the Si and NO votes as shown in the next figure:


 



 


Histogram of the occurrences of the Si (red), No (blue) votes as a function of the number of votes.


 


There are two distributions plotted in this figure: The Si bars are the distribution of occurrences of the number of Si voters for each machine with N voters, in the blue the distribution for the number of NO voters as a function of the number of N voters in the machine. As you can see it is as if the Si votes had had a piece chopped up for machines in which the number of registered voters was above 250 and up to 350. This data is for Miranda state, but if one looks at a similar histogram at the national level, the same type of “chopped up” binomial distribution is observed. This is the third conclusion: The distribution is a binomial that appears to have part of it “chopped up” as if part of the Si votes were shifted to No votes.


 


It is this same chopping up which accounts for the valleys in the two unusual dispersion curves.


 


Thus, it would seem as if the process is not at all like a binomial as it should be, but follows instead a distribution which appears to have some form of artificiality and selectively introduced into it, creating two types of distribution. Curiously, the abstention had the proper behaviour expected from a binomial, but is part of a process within the same data. This result is consistent with the hypothesis of Haussman and Rigobon that only a certain number of machines may have been manipulated, in this case, the data suggests if was a selection based on the number of registered voters per machine, which determined whether the data was manipulated or not.

On Mathematical models of the recall vote and fraud, part VIII: The physicists’ chopped up binomial distribution

September 7, 2004


 


This is a rewrite of last night’s post, Quico wanted it to be clear to 12 year olds, that might be stretching it, but hope it works for 21 year olds:


 


Isbelia Martin and a group of physicists at Simón Bolívar University have been looking at the statistics of the number of votes for each voting machine and by states.


 


The behaviour of the votes in an electoral process should follow what is called a binomial distribution.  A binomial distribution occurs whenever you have two outcomes of a process; the classical example is when you flip a coin. If the coin is fair, half of the time you get heads, half of the time you get tails. You get a distribution when you do an experiment many times, that is, suppose you flip a coin 1oo times and record how many heads or tails you get, but your repeat the experiment 100 times. You then record how many times you got only one head (very unlikely), two, three and so on. At the end you divide the frequency of getting each one of these cases and you get a probability distribution like this one that I stole from this site:


 



 


 


The voting process is similar in that the voters are in theory fairly independent in their decision. In the case of the voting process, the flipping of the coins becomes each voting center, which may have either a total number of voters or a total number of registered voters. So, you could construct a probability distribution, much like the one above, in which you could plot, as a first simple case, the number of people that actually voted. This is a binomial, because the voter decides between two choices whether to go and vote or not. Each voter is assumed to be independent of the other, even though there may be family pressures to go and vote. The main difference between this problem and the coin problem is that the probabilities are not 0.5 for each case. In fact, in the recall vote abstention was approximately 32%, so you could say that the probability of any given person voting was p=0.68 and the person not voting had a probability of 0.32.


 


What is different in this problem is that you have machines of different size, so what you can count is how many people voted n in each machine of size N, and then plot the frequency of occurrences for each machine of size N. What you get in the case of the voters in the recall referendum is something very similar to the distribution of the coin toss problem, which is expected, since both are binomial processes.


 


Mathematically, you can calculate the probability for a binomial process that you will get a value of n voters showing up to vote for each machine with N registered voters. Thus, if we have machines with N voters each, the probability that a voter will go vote is p and the probability that it will not go and vote is q=p-1, thus if we have M machines with N voters, the number of voters n that do go and vote, will be between “0 and N and will follow what is called a binomial distribution given by


 


P(n)=(N!/n!(N-n)!) p^n x q^(N-n)


 


This is a bell shaped curve like the one plotted above


 


Supposed we now plot instead the number of voters that did go and vote (abstention) as a function of the number of voters per machine that were registered at each machine, if the distribution is binomial the points for the abstention should form a cloud of points that open up like the tail of a comet with the greatest density along an imaginary line with a slope proportional to the average attendance of voters in that population. If half the people abstained, this cloud would be along the 22.5 degree line with respect to the horizontal, but since in the case of the recall the percentage of abstention was 32%, this cloud would be below the 22.5 degree line.


 


Below is a plot of such graph for the number of people n that did not go and vote in all of the centers in Miranda state as a function of the number of voters registered per machine N:


 



 


 


 


Plot of the number of voters that that abstained as a function of the number of registered voters N at each machine for Miranda state.


 


This is a textbook type of example of what one should get for a process that should follow a binomial distribution. Thus, the first conclusion is that the data from the recall vote in terms of the choice between going to vote or not behaves in Miranda state and nationally, much like what is expected from a binomial distribution.


 


The same logic should apply to the SI and NO votes. It should be a binomial distribution since it represents a choice between two possibilities. If the vote split were a perfect 50%/50% for the Si and the No, and one plotted the number of votes n for one or the other possibility as a function of the number of actual voters at each machine N, the cloud would spread below the 45 degree line that divides the plane, along a 22.5 degree imaginary line. In the recall vote, since the No won then, if one plots the dispersion plot for the NO votes on would get a cloud above the 22.5 degree line and a similar one below that line for the corresponding cloud of SI votes which is also plotted below.


 


However, what is observed is completely different as seen in the next graph for the number of NO votes n, in Miranda state as an example, as a function of the number of voters in each machine N:


 



 


 


 


Plot of the number of  No voters as a function of the number of voters N at each machine


 


 


Instead of obtaining a single cloud, one obtains two separate areas of high density with a valley of low density separating them. I have drawn three imaginary lines to guide the idea to the valley (area with low density between the two thicker clouds) as well as imaginary lines along the two separate clouds at each side.


 


Exactly the same type of behaviour is seen for the number of SI votes n in Miranda state as a function of the number of voters in each voting center N:


 



 


 


 


Plot of the number of Si voters as a function of the number of voters N at each machine


 


 


This shows the same low density valley, where I have drawn a line to guide the eye and two clouds at each side.


 


Thus, Miranda data, which conforms to a binomial distribution when one looks at the binomial process of abstention versus voting, does not conform to a binomial distribution. In fact, according to the authors, the data for Miranda state would NEVER conform to a binomial distribution. This is the second conclusion: The data for the Si and No votes does not conform to a binomial but is part of the same data that did conform to a binomial in the case of the abstention. In fact it would never conform to it.


 


Even more interesting, the same type of behaviour has been seen in Zulia, Carabobo, Anzoategui, Tachira and  Lara, but “textbook” type of behaviour is found in other states such as Falcon and Vargas. Other smaller states also show classical behaviour. This creates a big problem, how would one explain that some states behave exactly like a binomial, textbooks cases, no discrepancies, while certain selected states do not?


 


In order to try to understand this unusual behaviour, the authors plotted the histogram of occurrences for the Si and NO votes as shown in the next figure:


 



 


Histogram of the occurrences of the Si (red), No (blue) votes as a function of the number of votes.


 


There are two distributions plotted in this figure: The Si bars are the distribution of occurrences of the number of Si voters for each machine with N voters, in the blue the distribution for the number of NO voters as a function of the number of N voters in the machine. As you can see it is as if the Si votes had had a piece chopped up for machines in which the number of registered voters was above 250 and up to 350. This data is for Miranda state, but if one looks at a similar histogram at the national level, the same type of “chopped up” binomial distribution is observed. This is the third conclusion: The distribution is a binomial that appears to have part of it “chopped up” as if part of the Si votes were shifted to No votes.


 


It is this same chopping up which accounts for the valleys in the two unusual dispersion curves.


 


Thus, it would seem as if the process is not at all like a binomial as it should be, but follows instead a distribution which appears to have some form of artificiality and selectively introduced into it, creating two types of distribution. Curiously, the abstention had the proper behaviour expected from a binomial, but is part of a process within the same data. This result is consistent with the hypothesis of Haussman and Rigobon that only a certain number of machines may have been manipulated, in this case, the data suggests if was a selection based on the number of registered voters per machine, which determined whether the data was manipulated or not.

On Mathematical models of the recall vote and fraud, part VIII: The physicists’ chopped up binomial distribution

September 7, 2004


 


This is a rewrite of last night’s post, Quico wanted it to be clear to 12 year olds, that might be stretching it, but hope it works for 21 year olds:


 


Isbelia Martin and a group of physicists at Simón Bolívar University have been looking at the statistics of the number of votes for each voting machine and by states.


 


The behaviour of the votes in an electoral process should follow what is called a binomial distribution.  A binomial distribution occurs whenever you have two outcomes of a process; the classical example is when you flip a coin. If the coin is fair, half of the time you get heads, half of the time you get tails. You get a distribution when you do an experiment many times, that is, suppose you flip a coin 1oo times and record how many heads or tails you get, but your repeat the experiment 100 times. You then record how many times you got only one head (very unlikely), two, three and so on. At the end you divide the frequency of getting each one of these cases and you get a probability distribution like this one that I stole from this site:


 



 


 


The voting process is similar in that the voters are in theory fairly independent in their decision. In the case of the voting process, the flipping of the coins becomes each voting center, which may have either a total number of voters or a total number of registered voters. So, you could construct a probability distribution, much like the one above, in which you could plot, as a first simple case, the number of people that actually voted. This is a binomial, because the voter decides between two choices whether to go and vote or not. Each voter is assumed to be independent of the other, even though there may be family pressures to go and vote. The main difference between this problem and the coin problem is that the probabilities are not 0.5 for each case. In fact, in the recall vote abstention was approximately 32%, so you could say that the probability of any given person voting was p=0.68 and the person not voting had a probability of 0.32.


 


What is different in this problem is that you have machines of different size, so what you can count is how many people voted n in each machine of size N, and then plot the frequency of occurrences for each machine of size N. What you get in the case of the voters in the recall referendum is something very similar to the distribution of the coin toss problem, which is expected, since both are binomial processes.


 


Mathematically, you can calculate the probability for a binomial process that you will get a value of n voters showing up to vote for each machine with N registered voters. Thus, if we have machines with N voters each, the probability that a voter will go vote is p and the probability that it will not go and vote is q=p-1, thus if we have M machines with N voters, the number of voters n that do go and vote, will be between “0 and N and will follow what is called a binomial distribution given by


 


P(n)=(N!/n!(N-n)!) p^n x q^(N-n)


 


This is a bell shaped curve like the one plotted above


 


Supposed we now plot instead the number of voters that did go and vote (abstention) as a function of the number of voters per machine that were registered at each machine, if the distribution is binomial the points for the abstention should form a cloud of points that open up like the tail of a comet with the greatest density along an imaginary line with a slope proportional to the average attendance of voters in that population. If half the people abstained, this cloud would be along the 22.5 degree line with respect to the horizontal, but since in the case of the recall the percentage of abstention was 32%, this cloud would be below the 22.5 degree line.


 


Below is a plot of such graph for the number of people n that did not go and vote in all of the centers in Miranda state as a function of the number of voters registered per machine N:


 



 


 


 


Plot of the number of voters that that abstained as a function of the number of registered voters N at each machine for Miranda state.


 


This is a textbook type of example of what one should get for a process that should follow a binomial distribution. Thus, the first conclusion is that the data from the recall vote in terms of the choice between going to vote or not behaves in Miranda state and nationally, much like what is expected from a binomial distribution.


 


The same logic should apply to the SI and NO votes. It should be a binomial distribution since it represents a choice between two possibilities. If the vote split were a perfect 50%/50% for the Si and the No, and one plotted the number of votes n for one or the other possibility as a function of the number of actual voters at each machine N, the cloud would spread below the 45 degree line that divides the plane, along a 22.5 degree imaginary line. In the recall vote, since the No won then, if one plots the dispersion plot for the NO votes on would get a cloud above the 22.5 degree line and a similar one below that line for the corresponding cloud of SI votes which is also plotted below.


 


However, what is observed is completely different as seen in the next graph for the number of NO votes n, in Miranda state as an example, as a function of the number of voters in each machine N:


 



 


 


 


Plot of the number of  No voters as a function of the number of voters N at each machine


 


 


Instead of obtaining a single cloud, one obtains two separate areas of high density with a valley of low density separating them. I have drawn three imaginary lines to guide the idea to the valley (area with low density between the two thicker clouds) as well as imaginary lines along the two separate clouds at each side.


 


Exactly the same type of behaviour is seen for the number of SI votes n in Miranda state as a function of the number of voters in each voting center N:


 



 


 


 


Plot of the number of Si voters as a function of the number of voters N at each machine


 


 


This shows the same low density valley, where I have drawn a line to guide the eye and two clouds at each side.


 


Thus, Miranda data, which conforms to a binomial distribution when one looks at the binomial process of abstention versus voting, does not conform to a binomial distribution. In fact, according to the authors, the data for Miranda state would NEVER conform to a binomial distribution. This is the second conclusion: The data for the Si and No votes does not conform to a binomial but is part of the same data that did conform to a binomial in the case of the abstention. In fact it would never conform to it.


 


Even more interesting, the same type of behaviour has been seen in Zulia, Carabobo, Anzoategui, Tachira and  Lara, but “textbook” type of behaviour is found in other states such as Falcon and Vargas. Other smaller states also show classical behaviour. This creates a big problem, how would one explain that some states behave exactly like a binomial, textbooks cases, no discrepancies, while certain selected states do not?


 


In order to try to understand this unusual behaviour, the authors plotted the histogram of occurrences for the Si and NO votes as shown in the next figure:


 



 


Histogram of the occurrences of the Si (red), No (blue) votes as a function of the number of votes.


 


There are two distributions plotted in this figure: The Si bars are the distribution of occurrences of the number of Si voters for each machine with N voters, in the blue the distribution for the number of NO voters as a function of the number of N voters in the machine. As you can see it is as if the Si votes had had a piece chopped up for machines in which the number of registered voters was above 250 and up to 350. This data is for Miranda state, but if one looks at a similar histogram at the national level, the same type of “chopped up” binomial distribution is observed. This is the third conclusion: The distribution is a binomial that appears to have part of it “chopped up” as if part of the Si votes were shifted to No votes.


 


It is this same chopping up which accounts for the valleys in the two unusual dispersion curves.


 


Thus, it would seem as if the process is not at all like a binomial as it should be, but follows instead a distribution which appears to have some form of artificiality and selectively introduced into it, creating two types of distribution. Curiously, the abstention had the proper behaviour expected from a binomial, but is part of a process within the same data. This result is consistent with the hypothesis of Haussman and Rigobon that only a certain number of machines may have been manipulated, in this case, the data suggests if was a selection based on the number of registered voters per machine, which determined whether the data was manipulated or not.

WSJ article on the Hausmann, Rigobon study

September 7, 2004

Wall street Journal pays attention and quotes the academic that the Carter Center quoted when Taylor said he was in error.


Academics’ Study
Backs Fraud Claim
In Chavez Election


By DAVID LUHNOW in Mexico City and JOSE DE CORDOBA in Miami
Staff Reporters of THE WALL STREET JOURNAL
September 7, 2004; Page A18


Two Venezuelan academics claim to have found statistical evidence of fraud in last month’s referendum on President Hugo Chavez, fueling the opposition’s claims of a rigged vote and raising the possibility that despite Mr. Chavez’s victory, the country’s tense standoff will continue.


The claims were made Sunday by Ricardo Hausmann, a professor at Harvard University‘s John F. Kennedy School of Government and former chief economist at the Inter-American Development Bank, and Roberto Rigobon, a professor of applied economics at the Massachusetts Institute of Technology’s Sloan School of Management.


The pair issued a report that tried to measure the possibility that the vote was clean using two separate analyses of the official results. In both cases, they said, the chances of a clean vote were less than one in 100.


Members of a civic group called Sumate that organized the referendum, which Mr. Chavez won by a 59% to 41% margin, seized on the study to suggest Mr. Chavez had won by tampering with the electronic-voting machines used in the contest. “We don’t think the truth about the referendum has been revealed yet,” Alejandro Plaz, a spokesman for Sumate, told reporters in presenting Mr. Hausmann’s study Sunday. Sumate requested help from the academics in analyzing the referendum data but didn’t pay for the study.


Mr. Chavez’s government reacted with disbelief to the claims, saying the opposition’s previous claims of fraud had so far proved incorrect. Vice President Jose Vicente Rangel said members of the Atlanta-based Carter Center and the Organization of American States had already validated the result. “No one believes in their theories anymore because three weeks have gone by and they haven’t been able to prove anything,” Mr. Rangel said.


Members of the Carter Center and the OAS were unreachable for comment yesterday. But both organizations have consistently stood by their findings in the past weeks and watched as other theories of fraud fell short under scrutiny.


The results of the study, however, prompted some independent experts on computer voting to call on the Venezuelan government to open up all aspects of the election — including electronic codes from voting machines — to public scrutiny.


“The Hausmann/Rigobon study is more credible than many of the other allegations being thrown around,” said Aviel Rubin, a computer-science professor at Johns Hopkins University who has warned about security flaws with electronic voting. Mr. Rubin recently conducted a study of opposition claims that machines were rigged to limit the number of votes against Mr. Chavez and concluded the claims were highly unlikely.


“I would encourage the Venezuelan government to open up all aspects of the election to public inspection, not just to selected observers. That includes all of the paper ballots, the source code in the voting machines, the random generators … that were used to pick the sites to audit,” he said in an e-mail interview.


The study by Messrs. Hausmann and Rigobon suggested the government may have tampered with only some of the machines, leaving others clean for observers to audit. They said the sample used for the audit, which was carried out days after the election, wasn’t randomly chosen and limited to the “clean” machines.


The study says the computer that determined which ballot boxes were to be subjected to a recount belonged to Venezuelan election officials. However, the Carter Center‘s Jennifer McCoy has said the group tested and verified the computer program used to select the sample.


The study compared the votes obtained by the opposition during the recall vote with the signatures gathered in November 2003 requesting the referendum. For the recounted votes, the correlation between the number of “yes” votes matched the 2003 petition numbers at a rate that was 10% higher than in the ballot boxes that weren’t recounted. They calculate the probability of this taking place by chance at less than 1%.


The government’s sample recount “was not a random sample, and I can say that with 99% confidence,” Mr. Hausmann said in a telephone interview.


The academics used another technique to look for suspicious patterns in the results, using the 2003 petition and an exit poll on the day of the vote as a vague measure of a voter’s intention. Because both measures are imperfect for different reasons, the academics argued, the measures should make different mistakes in predicting the final result.


But the academics found that each method had similar margins of error when compared with the official results, something that would happen only one in 100 times without fraud, they argued.

Diebold’s growing pains in electronic voting

September 7, 2004

Article in bloomberg.com about the rough time Diebold has had in the field of Electronic voting

Capriles to be tried in freedom

September 7, 2004

I have not posted anything about Henrique Capriles Radonsky, the Mayor of the Baruta municipality of Caracas,  being freed because I was looking for something wise to say. He should have never been jailed because according to Venezuela’s laws, he should have been tried while in freedom, but justicewas manipulated sufficiently to keep him jailed for quite a while. My only educated guess at this point is that after his case was handled by nine different judges and the last one, Leon Villanueva, being detained for extortion, they could not find any more judges that could be manipulated easily.

On Mathematical models of the recall vote and fraud, part VII: Hausmann and Rigobon, a wedge of black swans?

September 6, 2004

Yesterday, Sumate held a press conference which I did not mention because I simply did not understand what they had done in terms of proving or not the existence of fraud, this was based on a study by Ricardo Hausmann of Harvard University and Roberto Rigobon of MIT  (HR) which was made available (at least to me!) today. The study they did is quite technical, so I will not try to explain it in detail but only give you an idea of what they did. The report is entitled: Searching for a black swan: Analysis of the statistical evidence about electoral fraud in Venezuela.


Besides the technical details, the report repeats a number of issues which are important in understanding the possibility of fraud; it also adds some information which should be here for the record. According to HR these are the elements that lead to a presumption of fraud:


 


1-The opposition wanted a manual count from the beginning, but it was electronic.


2-There was no manual counting of the printed ballots, instead there was to be a “hot audit” of 1% of the ballot boxes which never took place. Only 78 boxes were counted, with the opposition present in the counting of 28.


3-International Observers were not allowed in the totalization room, neither was the opposition.


4-The voting machines had bidirectional communications. (My note: The Head of Smartmatic said this was not the case in his press conference the week after the vote).


5-Contrary to what was agreed on, the voting machines communicated with the servers before printing the results.


6-Exit polls disagree with the results. (I add: And those whose details have been made available agree with each other)


7-In the second audit the random selection of boxes was made using the software provided by the CNE.


 


Exit Polls


 


The report provides interesting data on them. For example:


 


Percentage of SI votes in Sumate’s exit poll                                                        59.5%


Percentage of SI’s according to CNE in centers where Sumate did exit polls     42.9%


Percentage of SI votes in PJ’s exit poll                                                                 62.6%


Percentage of Si’s according to CNE in centers where PJ did exit polls              42.9%


 


The idea is that people always dismiss exit polls suggesting they are not done in the right place. Well, given that the SI received at the national level 40.63%, the difference is not that significant in the final results where the exit polls were performed.


 


Caps or coincidences


 


They test statistically for the caps and conclude that if there was fraud it was not via imposing caps on the maximum number of votes per machine. For quite a while I have referred to the caps as coincidences, believing they may be the consequence rather than how a fraud may be perpetrated. This would agree with that.


 


Detection of statistical fraud


 


What HR did was to look how to measure the intention of vote. To do that, they looked at two independent measures of the intention of vote: The exit poll results and the signatures from the Reafirmazo process. The idea is that each of these two represents real data, distinct from the actual vote, on vote intention. They then do a regression between signatures per center and the actual vote at those centers and the same regression between exit polls and the actual vote at the centers. In a regression you calculate the line or equation that best fits all of the points of the data you have, i. e. in the case of the exit polls what is the line that best fits the results announced by the CNE at those same centers.


 


When the above two regressions are done, there are errors, that is differences between the line and the points. But the sources of these errors are independents for the two processes. The only way in which they could be correlated (similar) is if the error has a common source, in this case fraud is the only possibility of “correlation” between the two. Well, the mathematical comparison of the errors of the regression yielded that there was a correlation of 0.24, where two things that do exactly the same have correlation 1 and two that have nothing to do with each other have correlation zero. To put it in a simple way: In voting centers where the signatures predicted a higher number of Si’s than the actual vote recorded, the exit polls also predicted a higher number of final votes.


 


Given that this correlation is simply too high and can not be explained away, they concluded that the only thing in common the two processes may have, is fraud.


 


The audit


 


Using statistical theory HR calculated the possibility that some of the voting centers had their votes manipulated and other did not. What they did was to compare those centers that were audited with those that were not audited. If those audited came from the “same” sample there should be no difference as the sample should be random. The result is quite remarkable: the results for the centers that were audited generated 10% more SI votes than those that were not audited. The probability that this was coincidental is less than 1%. Thus, the rather strong conclusion is that the centers were not chosen at random!


 


HR conclude by saying that in statistics it is impossible to confirm a hypothesis, but you can reject one. They then quote Popper who said that observing 1000 white swans did not prove all swans were white, but if you see a black one, you can reject that hypothesis. To HR their results are that they found a black swan, therefore, the hypothesis that there was a fraud is consistent with their results and thus, they can not reject it.


 


Well, my feeling is that with Elio’s work, Bruno’s and Raquel’s and some more that are soon to be revealed, what we have is a wedge of black swans getting together and forming! Someone should be getting worried, both here and abroad!


 


Join a discussion of this post

On Mathematical models of the recall vote and fraud, part VII: Hausmann and Rigobon, a wedge of black swans?

September 6, 2004

Yesterday, Sumate held a press conference which I did not mention because I simply did not understand what they had done in terms of proving or not the existence of fraud, this was based on a study by Ricardo Hausmann of Harvard University and Roberto Rigobon of MIT  (HR) which was made available (at least to me!) today. The study they did is quite technical, so I will not try to explain it in detail but only give you an idea of what they did. The report is entitled: Searching for a black swan: Analysis of the statistical evidence about electoral fraud in Venezuela.


Besides the technical details, the report repeats a number of issues which are important in understanding the possibility of fraud; it also adds some information which should be here for the record. According to HR these are the elements that lead to a presumption of fraud:


 


1-The opposition wanted a manual count from the beginning, but it was electronic.


2-There was no manual counting of the printed ballots, instead there was to be a “hot audit” of 1% of the ballot boxes which never took place. Only 78 boxes were counted, with the opposition present in the counting of 28.


3-International Observers were not allowed in the totalization room, neither was the opposition.


4-The voting machines had bidirectional communications. (My note: The Head of Smartmatic said this was not the case in his press conference the week after the vote).


5-Contrary to what was agreed on, the voting machines communicated with the servers before printing the results.


6-Exit polls disagree with the results. (I add: And those whose details have been made available agree with each other)


7-In the second audit the random selection of boxes was made using the software provided by the CNE.


 


Exit Polls


 


The report provides interesting data on them. For example:


 


Percentage of SI votes in Sumate’s exit poll                                                        59.5%


Percentage of SI’s according to CNE in centers where Sumate did exit polls     42.9%


Percentage of SI votes in PJ’s exit poll                                                                 62.6%


Percentage of Si’s according to CNE in centers where PJ did exit polls              42.9%


 


The idea is that people always dismiss exit polls suggesting they are not done in the right place. Well, given that the SI received at the national level 40.63%, the difference is not that significant in the final results where the exit polls were performed.


 


Caps or coincidences


 


They test statistically for the caps and conclude that if there was fraud it was not via imposing caps on the maximum number of votes per machine. For quite a while I have referred to the caps as coincidences, believing they may be the consequence rather than how a fraud may be perpetrated. This would agree with that.


 


Detection of statistical fraud


 


What HR did was to look how to measure the intention of vote. To do that, they looked at two independent measures of the intention of vote: The exit poll results and the signatures from the Reafirmazo process. The idea is that each of these two represents real data, distinct from the actual vote, on vote intention. They then do a regression between signatures per center and the actual vote at those centers and the same regression between exit polls and the actual vote at the centers. In a regression you calculate the line or equation that best fits all of the points of the data you have, i. e. in the case of the exit polls what is the line that best fits the results announced by the CNE at those same centers.


 


When the above two regressions are done, there are errors, that is differences between the line and the points. But the sources of these errors are independents for the two processes. The only way in which they could be correlated (similar) is if the error has a common source, in this case fraud is the only possibility of “correlation” between the two. Well, the mathematical comparison of the errors of the regression yielded that there was a correlation of 0.24, where two things that do exactly the same have correlation 1 and two that have nothing to do with each other have correlation zero. To put it in a simple way: In voting centers where the signatures predicted a higher number of Si’s than the actual vote recorded, the exit polls also predicted a higher number of final votes.


 


Given that this correlation is simply too high and can not be explained away, they concluded that the only thing in common the two processes may have, is fraud.


 


The audit


 


Using statistical theory HR calculated the possibility that some of the voting centers had their votes manipulated and other did not. What they did was to compare those centers that were audited with those that were not audited. If those audited came from the “same” sample there should be no difference as the sample should be random. The result is quite remarkable: the results for the centers that were audited generated 10% more SI votes than those that were not audited. The probability that this was coincidental is less than 1%. Thus, the rather strong conclusion is that the centers were not chosen at random!


 


HR conclude by saying that in statistics it is impossible to confirm a hypothesis, but you can reject one. They then quote Popper who said that observing 1000 white swans did not prove all swans were white, but if you see a black one, you can reject that hypothesis. To HR their results are that they found a black swan, therefore, the hypothesis that there was a fraud is consistent with their results and thus, they can not reject it.


 


Well, my feeling is that with Elio’s work, Bruno’s and Raquel’s and some more that are soon to be revealed, what we have is a wedge of black swans getting together and forming! Someone should be getting worried, both here and abroad!


 


Join a discussion of this post