On Thursday the second Simon Bolivar University seminar on Statistical Analysis of the referendum process was held. There were supposed to be three talks, but nature conspired against Luis Raul Pericchi, who was in Puerto Rico, and was unable to come to Venezuela due to hurricane Jeanne. Then, they planned a videoconference, but unfortunately the island lost all electric power, making it impossible to set it up. It will be tentatively scheduled for next Thursday.
You can find the program for these conferences here, I though all presentations would be placed there, but only one of them has so far been posted, more on that particular one later.
-There was talk by Rafael Torrealba from the Math Department at Universidad Centro Occidental Lisandro Alvarado. The talk would have been useful two or three weeks ago, but by now it is too simplistic a model to be useful. Basically, Torrealba calculated the probability of coincidences assuming all machines have 500 voters and approximating the binomial distribution by a “box” with zero probability above and below a standard deviation. Using this, Torrealba got that coincidences were as likely as observed in the recall vote and cited Rubin’s work, but was unaware of Taylor, Valladares and Jimenez. Thus, it was too crude at this point to make a point.
Torrealba also showed some voter distributions from the Barquisimeto area where he lives to discuss the implications of applying a binomial distribution.
-There was a second talk by Isbelia Martin on the binomial distribution and the vote from the recall. She did a more complete presentation of the results I summarized here. In the talk she presented much more material than the one I showed and if she places her presentation online I will link it in the future here.
What she did was to present the data for a textbook binomial state, Vargas State, and compare it to the data I presented on Miranda State. There are more anomalies to the data that I discussed, including the fact that if one does a fit through the “clouds” of results to obtain the average for each cloud, they do not intersect zero as they should. Additionally, she and her colleagues find that in some cases the same center has machines in both clouds, which obviously makes no sense.
-Jimenez, Jimenez and Marcano have now placed a simplified version of their work on coincidences here, I wish everyone would make their work available like that; it would make discussions more lively and interesting.
What they have done is essentially to use what is called a bootstrap method, which is a basically a simulation of the vote using the actual data from the recall referendum and modeling the details of the structure of the centers, tables (mesas) and machines. They allow all variables to fluctuate so that they do not have to assume the data is random which would not be if it had been intervened with.
Jimenez et al. do also a more detailed calculation of the problem by looking not only at the number of coincidences in the SI or No votes, but by looking at Si, No and all votes and comparing the probability of coincidences for each type of center. That is, they not only calculate how many centers had coincidences in two machines, but calculated how many centers with two machines, had coincidences in any of the three numbers (Si, No or sum of votes), how many centers with three machines did, how many with four etc. In this fashion one has a wider number of probabilities to compare the real data to what the simulations say.
They then did 1238 simulations and calculated the same probabilities for centers from 2 to 11 machines. In this manner they found that in general, the proportion of coincidences is higher in the actual vote that in the simulations, which led them to do a test of ranges, calculating the probability that the observed number of coincidences in the recall vote may occur for each center with n=2,3,4…..11 machines. In this manner, it is not simply a matter of asking what the probability of two machines coinciding is, but what is the probability that centers with two machines had the level of coincidences observed.
You can see the results in their paper in Table 3, but I will summarize some cases with examples:
Centers with two machines: The probability of observing the number of Si coincidences seen was 0.0323, the number of No coincidences was 0.7746 and the number of total vote coincidences was 0.0638. Thus, while low, it was probable that there were that many coincidences.
Centers with four machines: The probability of observing that number of Si coincidences was ZERO, with the probability of No coincidences being 0.2883 and the probability of total votes coinciding 0.00807. Similarly low probabilities were observed for the total number of coincidences in centers with 6 and 7 machines or extremely low probabilities in Si coincidences for centers with six machines.
The authors conclude:
-The repetitions observed in the Si vote and the total number of voters per machine in one center are considerably larger than expected. It is strange, but probable
-The repetitions observed in the NO votes are absolutely credible and in many cases, close to what was expected.
-The repetitions observed in the Si votes in centers with 4 machines and the number of voters in centers with six machines are extreme cases of their analysis. In these cases the author CAN NOT accept the hypothesis that the repetitions are due to randomness.
This last conclusion is the strongest found in the study of the coincidences in the number of votes within one center and it says the data could not have been random.