Uncategorized | The Devil's Excrement

Archive for the 'Uncategorized' Category

On Mathematical models of the recall vote and fraud, part VIII: The physicists’ chopped up binomial distribution

September 7, 2004

This is a rewrite of last night’s post, Quico wanted it to be clear to 12 year olds, that might be stretching it, but hope it works for 21 year olds:

Isbelia Martin and a group of physicists at Sim�n Bol�var University have been looking at the statistics of the number of votes for each voting machine and by states.

The behaviour of the votes in an electoral process should follow what is called a binomial distribution. A binomial distribution occurs whenever you have two outcomes of a process; the classical example is when you flip a coin. If the coin is fair, half of the time you get heads, half of the time you get tails. You get a distribution when you do an experiment many times, that is, suppose you flip a coin 1oo times and record how many heads or tails you get, but your repeat the experiment 100 times. You then record how many times you got only one head (very unlikely), two, three and so on. At the end you divide the frequency of getting each one of these cases and you get a probability distribution like this one that I stole from this site:

The voting process is similar in that the voters are in theory fairly independent in their decision. In the case of the voting process, the flipping of the coins becomes each voting center, which may have either a total number of voters or a total number of registered voters. So, you could construct a probability distribution, much like the one above, in which you could plot, as a first simple case, the number of people that actually voted. This is a binomial, because the voter decides between two choices whether to go and vote or not. Each voter is assumed to be independent of the other, even though there may be family pressures to go and vote. The main difference between this problem and the coin problem is that the probabilities are not 0.5 for each case. In fact, in the recall vote abstention was approximately 32%, so you could say that the probability of any given person voting was p=0.68 and the person not voting had a probability of 0.32.

What is different in this problem is that you have machines of different size, so what you can count is how many people voted n in each machine of size N, and then plot the frequency of occurrences for each machine of size N. What you get in the case of the voters in the recall referendum is something very similar to the distribution of the coin toss problem, which is expected, since both are binomial processes.

Mathematically, you can calculate the probability for a binomial process that you will get a value of n voters showing up to vote for each machine with N registered voters. Thus, if we have machines with N voters each, the probability that a voter will go vote is p and the probability that it will not go and vote is q=p-1, thus if we have M machines with N voters, the number of voters n that do go and vote, will be between �0 and N� and will follow what is called a binomial distribution given by

P(n)=(N!/n!(N-n)!) p^n x q^(N-n)

This is a bell shaped curve like the one plotted above

Supposed we now plot instead the number of voters that did go and vote (abstention) as a function of the number of voters per machine that were registered at each machine, if the distribution is binomial the points for the abstention should form a cloud of points that open up like the tail of a comet with the greatest density along an imaginary line with a slope proportional to the average attendance of voters in that population. If half the people abstained, this cloud would be along the 22.5 degree line with respect to the horizontal, but since in the case of the recall the percentage of abstention was 32%, this cloud would be below the 22.5 degree line.

Below is a plot of such graph for the number of people n that did not go and vote in all of the centers in Miranda state as a function of the number of voters registered per machine N:

Plot of the number of voters that that abstained as a function of the number of registered voters N at each machine for Miranda state.

This is a textbook type of example of what one should get for a process that should follow a binomial distribution. Thus, the first conclusion is that the data from the recall vote in terms of the choice between going to vote or not behaves in Miranda state and nationally, much like what is expected from a binomial distribution.

The same logic should apply to the SI and NO votes. It should be a binomial distribution since it represents a choice between two possibilities. If the vote split were a perfect 50%/50% for the Si and the No, and one plotted the number of votes n for one or the other possibility as a function of the number of actual voters at each machine N, the cloud would spread below the 45 degree line that divides the plane, along a 22.5 degree imaginary line. In the recall vote, since the No won then, if one plots the dispersion plot for the NO votes on would get a cloud above the 22.5 degree line and a similar one below that line for the corresponding cloud of SI votes which is also plotted below.

However, what is observed is completely different as seen in the next graph for the number of NO votes n, in Miranda state as an example, as a function of the number of voters in each machine N:

Plot of the number of No voters as a function of the number of voters N at each machine

Instead of obtaining a single cloud, one obtains two separate areas of high density with a valley of low density separating them. I have drawn three imaginary lines to guide the idea to the valley (area with low density between the two thicker clouds) as well as imaginary lines along the two separate clouds at each side.

Exactly the same type of behaviour is seen for the number of SI votes n in Miranda state as a function of the number of voters in each voting center N:

Plot of the number of Si voters as a function of the number of voters N at each machine

This shows the same low density valley, where I have drawn a line to guide the eye and two clouds at each side.

Thus, Miranda data, which conforms to a binomial distribution when one looks at the binomial process of abstention versus voting, does not conform to a binomial distribution. In fact, according to the authors, the data for Miranda state would NEVER conform to a binomial distribution. This is the second conclusion: The data for the Si and No votes does not conform to a binomial but is part of the same data that did conform to a binomial in the case of the abstention. In fact it would never conform to it.

Even more interesting, the same type of behaviour has been seen in Zulia, Carabobo, Anzoategui, Tachira and Lara, but �textbook� type of behaviour is found in other states such as Falcon and Vargas. Other smaller states also show classical behaviour. This creates a big problem, how would one explain that some states behave exactly like a binomial, textbooks cases, no discrepancies, while certain selected states do not?

In order to try to understand this unusual behaviour, the authors plotted the histogram of occurrences for the Si and NO votes as shown in the next figure:

Histogram of the occurrences of the Si (red), No (blue) votes as a function of the number of votes.

There are two distributions plotted in this figure: The Si bars are the distribution of occurrences of the number of Si voters for each machine with N voters, in the blue the distribution for the number of NO voters as a function of the number of N voters in the machine. As you can see it is as if the Si votes had had a piece chopped up for machines in which the number of registered voters was above 250 and up to 350. This data is for Miranda state, but if one looks at a similar histogram at the national level, the same type of �chopped up� binomial distribution is observed. This is the third conclusion: The distribution is a binomial that appears to have part of it �chopped up� as if part of the Si votes were shifted to No votes.

It is this same chopping up which accounts for the valleys in the two unusual dispersion curves.

Thus, it would seem as if the process is not at all like a binomial as it should be, but follows instead a distribution which appears to have some form of artificiality and selectively introduced into it, creating two types of distribution. Curiously, the abstention had the proper behaviour expected from a binomial, but is part of a process within the same data. This result is consistent with the hypothesis of Haussman and Rigobon that only a certain number of machines may have been manipulated, in this case, the data suggests if was a selection based on the number of registered voters per machine, which determined whether the data was manipulated or not.

Posted in Uncategorized | Leave a Comment »

On Mathematical models of the recall vote and fraud, part VII: Hausmann and Rigobon, a wedge of black swans?

September 6, 2004

Yesterday, Sumate held a press conference which I did not mention because I simply did not understand what they had done in terms of proving or not the existence of fraud, this was based on a study by Ricardo Hausmann of Harvard University and Roberto Rigobon of MIT (HR) which was made available (at least to me!) today. The study they did is quite technical, so I will not try to explain it in detail but only give you an idea of what they did. The report is entitled: Searching for a black swan: Analysis of the statistical evidence about electoral fraud in Venezuela.

Besides the technical details, the report repeats a number of issues which are important in understanding the possibility of fraud; it also adds some information which should be here for the record. According to HR these are the elements that lead to a presumption of fraud:

1-The opposition wanted a manual count from the beginning, but it was electronic.

2-There was no manual counting of the printed ballots, instead there was to be a �hot audit� of 1% of the ballot boxes which never took place. Only 78 boxes were counted, with the opposition present in the counting of 28.

3-International Observers were not allowed in the totalization room, neither was the opposition.

4-The voting machines had bidirectional communications. (My note: The Head of Smartmatic said this was not the case in his press conference the week after the vote).

5-Contrary to what was agreed on, the voting machines communicated with the servers before printing the results.

6-Exit polls disagree with the results. (I add: And those whose details have been made available agree with each other)

7-In the second audit the random selection of boxes was made using the software provided by the CNE.

Exit Polls

The report provides interesting data on them. For example:

Percentage of SI votes in Sumate�s exit poll 59.5%

Percentage of SI�s according to CNE in centers where Sumate did exit polls 42.9%

Percentage of SI votes in PJ�s exit poll 62.6%

Percentage of Si�s according to CNE in centers where PJ did exit polls 42.9%

The idea is that people always dismiss exit polls suggesting they are not done in the right place. Well, given that the SI received at the national level 40.63%, the difference is not that significant in the final results where the exit polls were performed.

Caps or coincidences

They test statistically for the caps and conclude that if there was fraud it was not via imposing caps on the maximum number of votes per machine. For quite a while I have referred to the caps as coincidences, believing they may be the consequence rather than how a fraud may be perpetrated. This would agree with that.

Detection of statistical fraud

What HR did was to look how to measure the intention of vote. To do that, they looked at two independent measures of the intention of vote: The exit poll results and the signatures from the Reafirmazo process. The idea is that each of these two represents real data, distinct from the actual vote, on vote intention. They then do a regression between signatures per center and the actual vote at those centers and the same regression between exit polls and the actual vote at the centers. In a regression you calculate the line or equation that best fits all of the points of the data you have, i. e. in the case of the exit polls what is the line that best fits the results announced by the CNE at those same centers.

When the above two regressions are done, there are errors, that is differences between the line and the points. But the sources of these errors are independents for the two processes. The only way in which they could be correlated (similar) is if the error has a common source, in this case fraud is the only possibility of �correlation� between the two. Well, the mathematical comparison of the errors of the regression yielded that there was a correlation of 0.24, where two things that do exactly the same have correlation 1 and two that have nothing to do with each other have correlation zero. To put it in a simple way: In voting centers where the signatures predicted a higher number of Si�s than the actual vote recorded, the exit polls also predicted a higher number of final votes.

Given that this correlation is simply too high and can not be explained away, they concluded that the only thing in common the two processes may have, is fraud.

The audit

Using statistical theory HR calculated the possibility that some of the voting centers had their votes manipulated and other did not. What they did was to compare those centers that were audited with those that were not audited. If those audited came from the �same� sample there should be no difference as the sample should be random. The result is quite remarkable: the results for the centers that were audited generated 10% more SI votes than those that were not audited. The probability that this was coincidental is less than 1%. Thus, the rather strong conclusion is that the centers were not chosen at random!

HR conclude by saying that in statistics it is impossible to confirm a hypothesis, but you can reject one. They then quote Popper who said that observing 1000 white swans did not prove all swans were white, but if you see a black one, you can reject that hypothesis. To HR their results are that they found a black swan, therefore, the hypothesis that there was a fraud is consistent with their results and thus, they can not reject it.

Well, my feeling is that with Elio�s work, Bruno�s and Raquel�s and some more that are soon to be revealed, what we have is a wedge of black swans getting together and forming! Someone should be getting worried, both here and abroad!

Join a discussion of this post

Posted in Uncategorized | Leave a Comment »

On Mathematical models of the recall vote and fraud, part VII: Hausmann and Rigobon, a wedge of black swans?

September 6, 2004

1-The opposition wanted a manual count from the beginning, but it was electronic.

3-International Observers were not allowed in the totalization room, neither was the opposition.

4-The voting machines had bidirectional communications. (My note: The Head of Smartmatic said this was not the case in his press conference the week after the vote).

5-Contrary to what was agreed on, the voting machines communicated with the servers before printing the results.

6-Exit polls disagree with the results. (I add: And those whose details have been made available agree with each other)

7-In the second audit the random selection of boxes was made using the software provided by the CNE.

Exit Polls

The report provides interesting data on them. For example:

Percentage of SI votes in Sumate�s exit poll 59.5%

Percentage of SI�s according to CNE in centers where Sumate did exit polls 42.9%

Percentage of SI votes in PJ�s exit poll 62.6%

Percentage of Si�s according to CNE in centers where PJ did exit polls 42.9%

Caps or coincidences

Detection of statistical fraud

Given that this correlation is simply too high and can not be explained away, they concluded that the only thing in common the two processes may have, is fraud.

The audit

Join a discussion of this post

Posted in Uncategorized | Leave a Comment »

On Mathematical models of the recall vote and fraud, part VII: Hausmann and Rigobon, a wedge of black swans?

September 6, 2004

1-The opposition wanted a manual count from the beginning, but it was electronic.

3-International Observers were not allowed in the totalization room, neither was the opposition.

4-The voting machines had bidirectional communications. (My note: The Head of Smartmatic said this was not the case in his press conference the week after the vote).

5-Contrary to what was agreed on, the voting machines communicated with the servers before printing the results.

6-Exit polls disagree with the results. (I add: And those whose details have been made available agree with each other)

7-In the second audit the random selection of boxes was made using the software provided by the CNE.

Exit Polls

The report provides interesting data on them. For example:

Percentage of SI votes in Sumate�s exit poll 59.5%

Percentage of SI�s according to CNE in centers where Sumate did exit polls 42.9%

Percentage of SI votes in PJ�s exit poll 62.6%

Percentage of Si�s according to CNE in centers where PJ did exit polls 42.9%

Caps or coincidences

Detection of statistical fraud

Given that this correlation is simply too high and can not be explained away, they concluded that the only thing in common the two processes may have, is fraud.

The audit

Join a discussion of this post

Posted in Uncategorized | Leave a Comment »

On Mathematical models of the recall vote and fraud, part V: Prof. Taylor posts correction, changing Carter Center conclusions

September 5, 2004

A reader, Mercedes Rosas, has pointed out in the comments below to a correction in the webpage by Prof. Jonathan Taylor of Stanford University and an e-mail from Prof. Taylor to her, on the results of the recall vote in which he says he made an error earlier and corrects hs results. Prof. Taylor�s work was used by the Carter Center to �show� that the number of �Si� coincidences in the mesa (table) votes was reasonable. This result was widely used and quoted by the international press as part for the �evidence� that there was no evidence of fraud in the recall vote.

As I have reported elsewhere, Elio Valladares got quite a different result, suggesting that the probability was not that �reasonable�, in fact Elio obtained that it was small, if not miniscule. Now Prof. Taylor has corrected his work on his web page and I would like to quote him so that there is no misinterpretation of what he says or not:

� It seems that an expected number of ties between 345 and 350 is reasonable, as it came out from many different models. Using the Poisson assumption to estimate the standard error, it seems then that the probability of observing 402 or more ties for SI is between 1 and 3 in 1000. While this probability is small, I do not feel that it should be interpreted as overwhelming evidence of fraud.�

Yes, is not overwhelming, but we have gone from reasonable to small, but it was the reasonable that led the Carter Center to its conclusion. What would they say now?. By the way, the CNE also used this result by Prof. Taylor to say that the Si vote coincidences were irrelevant.

Prof. Taylor has acted with the integrity characteristic of scientists, I wonder if the Carter Center will post a clarification to their conclusions, but doubt it. I sure hope Prof. Taylor will now calculate the coincidences at the machine level. I believe in that case he will find that the probability is even lower, if not impossible!!! That should have been the case that the Carter Center should have had him study to begin with!

Note added: I have now received a new study by Elio Valladares in which, if I understand correctly, he simulates the vote at the level of the cuadernos (each cuaderno is a machine in centers with electronic voting) rather than at the mesa level. He then calculates then the probabilities at the mesa level using the results from the simulation of the cuadernos. His conclusion from this is that the number of observed SI vote coincidences should be 1 in 10,000, which I think even Prof. Taylor would consider we are getting into the realm of the “overwhelming evidence of fraud”

Second note added: Prof. Taylor has now removed the word overwhelming from his conclusions. I wish that rather than worry about words, they would work on the real problem at the machine level.

Posted in Uncategorized | Leave a Comment »

On Mathematical models of the recall vote and fraud, part V: Prof. Taylor posts correction, changing Carter Center conclusions

September 5, 2004

Second note added: Prof. Taylor has now removed the word overwhelming from his conclusions. I wish that rather than worry about words, they would work on the real problem at the machine level.

Posted in Uncategorized | Leave a Comment »

On Mathematical studies of the recall vote and fraud: Part IV

September 2, 2004

A Prof. From Princeton and two from John Hopkins have done a calculation similar to that by Elio Valladares that I posted yesterday. Their conclusions are found here in English and while they are very similar quantitatively to Elio�s they differ in their conclusion.

They did a very similar calculation to Elio�s. but 1238 simulations only, compared to the 10,000 Elio did. Their results are very similar, they found the distribution had an average of 360.90 for the Si coincidences and 317 for the No coincidences. This means that the occurrence of the No ones is reasonable, but the one for the Si coincidences is less likely.

However, the authors feel comfortable that 2.3 standard deviations away from the mean are feasible and thus feel this proves little. Elio�s results are much more conclusive, it is unclear if it is because he made almost eight times more simulations, but in the interest of discussion and comparison, here are the results.

I continue to believe that simulations at the Center level would be more conclusive than these, as there seems to be a high concentration of coincidences at the center level.

Posted in Uncategorized | Leave a Comment »

On Mathematical studies of the recall vote and fraud: Part IV

September 2, 2004

I continue to believe that simulations at the Center level would be more conclusive than these, as there seems to be a high concentration of coincidences at the center level.

Posted in Uncategorized | Leave a Comment »

On Mathematical studies of the recall vote and fraud: Part III

September 1, 2004

Elio Valladares, who is at the University of Virginia, has completed a simulation that is very interesting because it looks at the problem of coincidences at the �mesa� table level, rather than at the machine level.

Recall that while the CD was talking about the anomalies in the number of coincidences at the center level, the Carter Center and the CNE were quick to dismiss that there was no such anomaly and the results were reasonable. Recall also that each Center may have a number of �mesas� tables and that each table may have one or more machines. Thus the two sides seemed to be talking about two different things, coincidences in the tables, of which there were 402 for the SI�s and 311 from the NO�s in our review of the machine results, or coincidences in the machines of which there were 805 in the Si�s at the center level or 647 for the No�s. However, the coincidences for the Si�s at the Center level translated to 1879 machines.

In its final report, the Carter Center said that it had consulted a Prof. from Stanford University who we understand was Jonathan Taylor from the Dept. of Statistics of that University. According to the final report by the Carter Center, these results based on the table coincidences are �probable�. However, no details has been ever been given of how exactly this conclusion was reached. In the mean time, studies showed that the coincidences at the machine level were not that probable.

What Valladares has done is to simulate the probability of a coincidence using the total number of votes at each table, from the results for the referendum. He then uses the real numbers to simulate 10,000 elections and calculates the number of times these coincidences occur. The results for this calculation are shown in the plot below:

This is the total number of coincidences seeing at the table level, the distribution peaks around 345 with a fairly narrow distribution. According to the study, Valladares concludes that the probability of having 393 coincidences for the SI is 0.0028, which according to him is the number of Si coincidences per table reported by the CNE. Our calculations are that the number is 401 which is even less likely to occur.

Even more interesting is the fact that the number is of �NO� coincidences in the same calculation is not found to be so unlikely, with a probability of 0.17 of finding 311 cases in which the No�s coincide.

It would be interesting to know of either Prof. Taylor or the Carter Center have anything to comment on this, as Valladares� results contradict their conclusion and tend to support quantitatively the thesis that there was some form of fraud on Aug. 15^th.

Posted in Uncategorized | Leave a Comment »

On Mathematical studies of the recall vote and fraud: Part III

September 1, 2004

Posted in Uncategorized | Leave a Comment »

The Devil's Excrement

Archive for the 'Uncategorized' Category

On Mathematical models of the recall vote and fraud, part VIII: The physicists’ chopped up binomial distribution

On Mathematical models of the recall vote and fraud, part VII: Hausmann and Rigobon, a wedge of black swans?

On Mathematical models of the recall vote and fraud, part VII: Hausmann and Rigobon, a wedge of black swans?

On Mathematical models of the recall vote and fraud, part VII: Hausmann and Rigobon, a wedge of black swans?

On Mathematical models of the recall vote and fraud, part V: Prof. Taylor posts correction, changing Carter Center conclusions

On Mathematical models of the recall vote and fraud, part V: Prof. Taylor posts correction, changing Carter Center conclusions

On Mathematical studies of the recall vote and fraud: Part IV

On Mathematical studies of the recall vote and fraud: Part IV

On Mathematical studies of the recall vote and fraud: Part III

On Mathematical studies of the recall vote and fraud: Part III

Unknown Feed

Subscribe to Blog via Email

Categories

All Blogs by Miguel Octavio

Blogroll

Blogs by Venezuelans

Recent Posts

Archives

Pages

Meta

Blog Stats