Archive for September, 2004

Electoral Registry Anomalies in the recall vote

September 13, 2004

While it is difficult to “prove” there was fraud in the recent recall vote, the team studying the Electoral Registry has detected quite a few anomalies that should be of concern as they seem to be quite implausible from any perspective. There can be few explanations to any of the following numbers, more so when you consider that the census population includes all inhabitants in these towns from babies to the elderly. The first column shows the census populations of those towns in 2001, then the populations of the same towns according to INE (Instituto Nacional de Estadistica), finally it shows the number of people registered to vote in those towns. In some cases it is more than the ENTIRE population of the towns!:


 


                 2001 Census  2004 INE Registered Voters Percentage


Amazonas State


Maroa              1,731            1,821             2,112                 118%


Rio Negro         2,348            2,515             1,960                   81%


Anzoategui State  


Libertad         13,980           14,301          12.088                   80%


Aragua State


Santos Michelena     39,371 41,941          63,518                 153%


Ocumare de la Costa  8,418  9,016            7,449                    84%


Falcon State


Sucre                          5,400   5,243            5,164                    90%


Tocopero                     4,688   4,971           4,654                     93%


Monagas State


Acosta                       17,568 17,973         22,490                   118%


Aguasay                    10,786 11,988           9,452                     80%


A total of 100 towns were found to have problems like those above. You be the judge.

New category: RR Studies

September 12, 2004

I have created a new category on the studies of the recall referendum which you can see on the left column of categories as RR studies. I have placed there all of the articles and posts on the studies from my page, so that they are at a single location. Will not post there articles from other sources. Hope it helps.

Dave Barry and electronic voting

September 12, 2004

Don’t forget to read Dave Barry´s article today on electronic voting. At least it may give you a laugh about our problems.

Dave Barry and electronic voting

September 12, 2004

Don’t forget to read Dave Barry´s article today on electronic voting. At least it may give you a laugh about our problems.

Seminar at Simon Bolivar University: Two presentations

September 12, 2004

Besides the overview presentation by Isabel Llatas, there were two talks at this first seminar:


1)      Statistical study of the CNE data by Bernardo Marquez et al.


 


This is a group of engineers which looked at the statistical properties of the electronic results at the two lowest levels of detail, at the Center level and at the parish level.


 


Basically, the CNE divided the nation into 321 municipalities. Each municipality was divided itself into parishes and the parishes into centers. There were on average 2.6 parishes per municipality and on average there were 5.4 centers per parish. There were on average 10,098 votes per center and 26,486 votes per parish.


 


The study was a statistical hypothesis testing of all of the CNE data. The basic hypothesis was that the CNE data is valid and thus by looking at averages an standard deviations one should be able to establish confidence levels at both the parish and the center level, in terms of votes within a parish or center being in the correct range. By this, it means that they look at the final result of a machine and check whether that result is within what is expected from the statistics of the center or the parish.


 


The authors found that with a 95% confidence level, there are only 7% of the machines at the center level which show unexpected results with respect to the center. In contrast, they found that 62% of the parishes showed unexpected results. If the confidence level was 99% they found 51% of the machines had unexpected results.


 


They then looked at each parish to see how much centers differed within a parish by looking at the standard deviations of each center. Then, they eliminated what they called the non-homogenous centers, those in which the centers within a parish showed significant differences in the standard deviations of the distributions. Thus, they kept only the “homogeneous” parishes and found that with a 95% level of confidence 42% of the machines showed unexpected results and with 99% confidence 26% of the machines showed unexpected results.


 


2)      A study of the coincidences in the votes in the machine by Raul Jimenez (USB), Alfredo Marcano (USB) and Juan Jimenez (UCV)


 


This talk discussed the various simulations that have been done to study the coincidences. It was very critical of Rubin’s and Taylor’s form the technical point of view. I must say that what I was not able to understand the details of what they did, it was beyond my understanding and I tried. Basically, they are using fairly sophisticated mathematical theory to look at the problem and study probabilities of occurrences.


 


In their most detailed work, they looked at the probability of SI and No coincidences as well as the probability that the sum of the Si and No votes also coincides. They obtained a probability of 3.5 in 10,000 for the SI coincidences, reasonable (I think it was 0.3) for the NO and 1 in 1,000,000 for the sum of SI and No to coincide. 


 


This result is being submitted as a scientific paper to a journal next week and the author said he will send me a copy when he send it in to the Journal.

Seminar at Simon Bolivar University: Two presentations

September 12, 2004

Besides the overview presentation by Isabel Llatas, there were two talks at this first seminar:


1)      Statistical study of the CNE data by Bernardo Marquez et al.


 


This is a group of engineers which looked at the statistical properties of the electronic results at the two lowest levels of detail, at the Center level and at the parish level.


 


Basically, the CNE divided the nation into 321 municipalities. Each municipality was divided itself into parishes and the parishes into centers. There were on average 2.6 parishes per municipality and on average there were 5.4 centers per parish. There were on average 10,098 votes per center and 26,486 votes per parish.


 


The study was a statistical hypothesis testing of all of the CNE data. The basic hypothesis was that the CNE data is valid and thus by looking at averages an standard deviations one should be able to establish confidence levels at both the parish and the center level, in terms of votes within a parish or center being in the correct range. By this, it means that they look at the final result of a machine and check whether that result is within what is expected from the statistics of the center or the parish.


 


The authors found that with a 95% confidence level, there are only 7% of the machines at the center level which show unexpected results with respect to the center. In contrast, they found that 62% of the parishes showed unexpected results. If the confidence level was 99% they found 51% of the machines had unexpected results.


 


They then looked at each parish to see how much centers differed within a parish by looking at the standard deviations of each center. Then, they eliminated what they called the non-homogenous centers, those in which the centers within a parish showed significant differences in the standard deviations of the distributions. Thus, they kept only the “homogeneous” parishes and found that with a 95% level of confidence 42% of the machines showed unexpected results and with 99% confidence 26% of the machines showed unexpected results.


 


2)      A study of the coincidences in the votes in the machine by Raul Jimenez (USB), Alfredo Marcano (USB) and Juan Jimenez (UCV)


 


This talk discussed the various simulations that have been done to study the coincidences. It was very critical of Rubin’s and Taylor’s form the technical point of view. I must say that what I was not able to understand the details of what they did, it was beyond my understanding and I tried. Basically, they are using fairly sophisticated mathematical theory to look at the problem and study probabilities of occurrences.


 


In their most detailed work, they looked at the probability of SI and No coincidences as well as the probability that the sum of the Si and No votes also coincides. They obtained a probability of 3.5 in 10,000 for the SI coincidences, reasonable (I think it was 0.3) for the NO and 1 in 1,000,000 for the sum of SI and No to coincide. 


 


This result is being submitted as a scientific paper to a journal next week and the author said he will send me a copy when he send it in to the Journal.

The land the Government works on by Carlos Machado Allison

September 12, 2004

Carlos Machado Allison is a retired Professor from Universidad Central de Venezuela who now Works at IESA. Carlos’s specialty is land use. Personally, I have never heard anyone in Venezuela talk about land use with Carlos’ knowledge, so I thought I would translate parts of his article this week in El Universal on the issue:


 


The land the Government works on by Carlos Machado Allison


 


Agricultural producers know agrarian demagoguery, with its vote capture potential and international sympathy at the expense of destroying trust. Its goal is not to improve production, productivity or the capability to feed the population, after all, powerful and with a good flow of US dollars, the state can continue importing and selling at a loss as long as it creates sympathies among the poor. It is a matter of perpetuating itself in power, Maduro dixit, increasing the size of the bureaucracy and imposing a state capitalism using the style of Mexico in the pre- war


 


In the revolutionary Mexico, General “Tata” Cardenas created the pro-Government Confederacion Campesina (Peasant Federation) (CNC), converted the Mexican Confederation of Workers in to an arm of the Government (1937) and threatened the industrials sector with handing the factories over to the workers, it distributed 19 million Hectares, without forgetting a piece for the revolutionary politicians and generals, created the mega bureaucracy of Pemex, managing to have it lose money for more than 40 years and also reduced agricultural production   by 7%.


 


But the Partido de la Revolucion Mexicana with its sector, workers, peasants, military and popular, rebaptized in 1946 as PRI, remained in power until five years ago. The peasants, like here, lacked title to their property and depended, financially, commercially and technically on the bureaucratic apparatchik. Velasco in Peru and the Sandinistas in Nicaragua did even worse, killing agricultural production, increasing agricultural imports and creating programs and state companies to administer foodstuffs. Poverty and unemployment grew, but the revolutionaries in the unions, bureaucratic positions and mayors and Governors got wealthy: a new oligarchy arose. Much like it is happening here.


 


If the purpose were to give away land, with all of the demagoguery, bureaucratic expense and inefficiency that it implies, there are between 15 and 20 million unproductive hectares in the hands of the state. The truth is that they don’t even know how much they have. Then, why an instantaneous militia census of private land with no technical basis? Why threaten everyone, if the little blue book (the Constitution) clearly says that the Government can expropriate for the public good with a sentence from a Court and payment of fair value. Could it be to threaten the Government’s adversaries while obtaining applauses from the “poor” beneficiaries? There will be some success, some will abandon the business, others will hold on hoping for better times and there will be others who will try to sell, if this Government, full of foreign currency decides to purchase the land, even if it does not know what for.


 


In another early surprise, the President will say that the militia has discovered thousands of unproductive hectares, that people are fattening the land and not the cattle. He will not say that there is unproductive land because consumption has been eroded, because there is unemployment and no personal or judicial security. He will not say either that there is not trustable census of land use and that in six years; agricultural productivity has been the worse in the continent. He will say that the land belongs to those that work it, that is, as long as it does not belong to that large real estate company, the state, which he presides.

The land the Government works on by Carlos Machado Allison

September 12, 2004

Carlos Machado Allison is a retired Professor from Universidad Central de Venezuela who now Works at IESA. Carlos’s specialty is land use. Personally, I have never heard anyone in Venezuela talk about land use with Carlos’ knowledge, so I thought I would translate parts of his article this week in El Universal on the issue:


 


The land the Government works on by Carlos Machado Allison


 


Agricultural producers know agrarian demagoguery, with its vote capture potential and international sympathy at the expense of destroying trust. Its goal is not to improve production, productivity or the capability to feed the population, after all, powerful and with a good flow of US dollars, the state can continue importing and selling at a loss as long as it creates sympathies among the poor. It is a matter of perpetuating itself in power, Maduro dixit, increasing the size of the bureaucracy and imposing a state capitalism using the style of Mexico in the pre- war


 


In the revolutionary Mexico, General “Tata” Cardenas created the pro-Government Confederacion Campesina (Peasant Federation) (CNC), converted the Mexican Confederation of Workers in to an arm of the Government (1937) and threatened the industrials sector with handing the factories over to the workers, it distributed 19 million Hectares, without forgetting a piece for the revolutionary politicians and generals, created the mega bureaucracy of Pemex, managing to have it lose money for more than 40 years and also reduced agricultural production   by 7%.


 


But the Partido de la Revolucion Mexicana with its sector, workers, peasants, military and popular, rebaptized in 1946 as PRI, remained in power until five years ago. The peasants, like here, lacked title to their property and depended, financially, commercially and technically on the bureaucratic apparatchik. Velasco in Peru and the Sandinistas in Nicaragua did even worse, killing agricultural production, increasing agricultural imports and creating programs and state companies to administer foodstuffs. Poverty and unemployment grew, but the revolutionaries in the unions, bureaucratic positions and mayors and Governors got wealthy: a new oligarchy arose. Much like it is happening here.


 


If the purpose were to give away land, with all of the demagoguery, bureaucratic expense and inefficiency that it implies, there are between 15 and 20 million unproductive hectares in the hands of the state. The truth is that they don’t even know how much they have. Then, why an instantaneous militia census of private land with no technical basis? Why threaten everyone, if the little blue book (the Constitution) clearly says that the Government can expropriate for the public good with a sentence from a Court and payment of fair value. Could it be to threaten the Government’s adversaries while obtaining applauses from the “poor” beneficiaries? There will be some success, some will abandon the business, others will hold on hoping for better times and there will be others who will try to sell, if this Government, full of foreign currency decides to purchase the land, even if it does not know what for.


 


In another early surprise, the President will say that the militia has discovered thousands of unproductive hectares, that people are fattening the land and not the cattle. He will not say that there is unproductive land because consumption has been eroded, because there is unemployment and no personal or judicial security. He will not say either that there is not trustable census of land use and that in six years; agricultural productivity has been the worse in the continent. He will say that the land belongs to those that work it, that is, as long as it does not belong to that large real estate company, the state, which he presides.

Seminar at Simon Bolivar University on the mathematics of the recall results: An Overview

September 9, 2004

There was a seminar today at Simon Bolivar University (USB), the leading technical university in Venezuela, on mathematical studies of the recall vote. The event, which was also sponsored by Universidad Central de Venezuela (UCV) was quite interesting. I was planning to write a full report, but unfortunately (for you) maybe fortunately (for me) I forgot my notes at my office and if I want to speak with precision, I need them.


Perhaps the most interesting part is the effect this is having on the academic community. You have a bunch of mathematicians and physicists applying the tools of their academic and research trade to a real life problem. Additionally, many people are working on the same problem so there is a lively and daily exchange of ideas. This is good for Venezuelan science, independent of the final results.


 


The problem is being look at from a variety of different angles that go from very pedestrian statistical analysis to sublime techniques and I am sure, soon some may get into using divine ones that I will never be able to understand. Speakers were very careful in not using the word ¨fraud¨, concentrating on ¨probability¨,¨ likelihood¨ and other such terms.


 


The first talk was given by Isabel Llatas and it was an overview of the work that is being done or has been done so far. I counted 24 different names of scientists here or abroad looking at the problem from different angles.


 


Llatas showed partial results from the work of Sanso and Prado, which I have posted here, from that of Isbelia Martin, which I posted two nights ago as well as from that of Luis Raul Pericchi who has been using Benford´s Law to study the results of the referendum vote. Pericchi will speak in the second one of these seminars next Thursday, but I found the work very interesting and will mention it later in this post.


 


Llatas showed how people have looked at the available CNE data in many different forms, separating it into data which was counted electronically and manually, as well as geographical distributions. What came across from the talk is that there is a lot of work that has already been done in the last three weeks with the available data and scientists are still working on things, making sure they are right, before publishing it or talking about it.


 


After this, came two talks which I will dwell on in detail later. The first one was by a group of engineers that have looked at the statistical properties of the votes at the center and parish level, finding what they call “irregular” results at a significant number of machines. The second talk was by Raul Jimenez  et al. who have been looking at the problem of coincidences and has some interesting formal and practical results, which suggest the coincidences are quite unlikely. One of his most surprising statements was that there are also coincidences in the total number of votes per machine SI’s+NO’s and he has found that these coincidences have the lowest probability of occurring, with a number like a probability being one in a million.


 


Before today I had heard of Pericchi’s work, but had no idea what that was about until I saw a graph of his results and decided to look into the background. (I have no more details than what I will give at the end of this post). His work is based on Benford’s Law, a concept that now that I know about it, I have to wonder how I could have lived all these years without it!  


 


Benford’s Law


 


Imagine you have a table of populations of towns and cities for a given country. These numbers are distributed according to a probability distribution with a mean and a standard deviation. But suppose that rather than look at the full number you looked at the first digit of each number, 1 thru 9, from left to right. Intuitively most people would think that the probability of that number being, 1, 2, 3…..or 9 would be exactly the same. Well, it isn’t. If you look at wide range of statistical tables, such as the prices of stocks in the NYSE, baseball statistics or even numbers in the financial statements of a company, you find that the probability of that first digit being a 1 is 0.301, 2 is 0.176, 3 is 0.124…all the way down to 9 the probability of which is 0.04576.


 


The following is a table taken from here with the probabilities found from taking first digit statistics of the first digit in numbers found in the front page of a newspaper, the 1990 census on county populations and the prices of the stocks in the Dow Jones Industrials from 1990-1993.


 



 


The reason for this is that the populations are evenly distributed on a logarithmic scale and many of these processes are logarithmic. Think of stock prices. If you issue stock at $10 and your company grows 100% every five years, the digit 1 would be the first one of your stock price the first five years, but after that, the digit two will only be part of it less than two and a half years and the length of time will get shorter as the stock price grows. So, if you have hundreds of stocks, you will always observe more first digits with a one than any other number.


 


This turns out to have important consequences in real life testing. Supposedly (haven’t found the reference) the first time someone saw something fishy in Enron’s numbers was because some particular table of number did not fit Benford’s Law.


 


The IRS uses Benford’s law to detect fraud, auditors to detect fraud in companies and companies to detect fraud by employees. The reason is simple, if someone tampers with the data, they will likely spread the numbers uniformly and the probability of a 1 as a first single digit would be as likely as any other number. The same thing happens if people commit fraud; they spread the amounts around evenly thinking that it will not be noticed. Auditing forms apparently have many tests like this for companies’ data such as customer refund tables and account receivables.


 


You can extend the calculation for the single digit to the first two digits and you can calculate those in that case too. 


 


What I understood today is that what Pericchi et al. have done is to apply Benford’s law to the election results, looking at the total votes at each “cuaderno” level. Reportedly, and I will report the details on it when I hear their talk next Thursday, they have found that the machine results do not fit Benford’s law at all, while the manual ones fit it quite well.

Seminar at Simon Bolivar University on the mathematics of the recall results: An Overview

September 9, 2004

There was a seminar today at Simon Bolivar University (USB), the leading technical university in Venezuela, on mathematical studies of the recall vote. The event, which was also sponsored by Universidad Central de Venezuela (UCV) was quite interesting. I was planning to write a full report, but unfortunately (for you) maybe fortunately (for me) I forgot my notes at my office and if I want to speak with precision, I need them.


Perhaps the most interesting part is the effect this is having on the academic community. You have a bunch of mathematicians and physicists applying the tools of their academic and research trade to a real life problem. Additionally, many people are working on the same problem so there is a lively and daily exchange of ideas. This is good for Venezuelan science, independent of the final results.


 


The problem is being look at from a variety of different angles that go from very pedestrian statistical analysis to sublime techniques and I am sure, soon some may get into using divine ones that I will never be able to understand. Speakers were very careful in not using the word ¨fraud¨, concentrating on ¨probability¨,¨ likelihood¨ and other such terms.


 


The first talk was given by Isabel Llatas and it was an overview of the work that is being done or has been done so far. I counted 24 different names of scientists here or abroad looking at the problem from different angles.


 


Llatas showed partial results from the work of Sanso and Prado, which I have posted here, from that of Isbelia Martin, which I posted two nights ago as well as from that of Luis Raul Pericchi who has been using Benford´s Law to study the results of the referendum vote. Pericchi will speak in the second one of these seminars next Thursday, but I found the work very interesting and will mention it later in this post.


 


Llatas showed how people have looked at the available CNE data in many different forms, separating it into data which was counted electronically and manually, as well as geographical distributions. What came across from the talk is that there is a lot of work that has already been done in the last three weeks with the available data and scientists are still working on things, making sure they are right, before publishing it or talking about it.


 


After this, came two talks which I will dwell on in detail later. The first one was by a group of engineers that have looked at the statistical properties of the votes at the center and parish level, finding what they call “irregular” results at a significant number of machines. The second talk was by Raul Jimenez  et al. who have been looking at the problem of coincidences and has some interesting formal and practical results, which suggest the coincidences are quite unlikely. One of his most surprising statements was that there are also coincidences in the total number of votes per machine SI’s+NO’s and he has found that these coincidences have the lowest probability of occurring, with a number like a probability being one in a million.


 


Before today I had heard of Pericchi’s work, but had no idea what that was about until I saw a graph of his results and decided to look into the background. (I have no more details than what I will give at the end of this post). His work is based on Benford’s Law, a concept that now that I know about it, I have to wonder how I could have lived all these years without it!  


 


Benford’s Law


 


Imagine you have a table of populations of towns and cities for a given country. These numbers are distributed according to a probability distribution with a mean and a standard deviation. But suppose that rather than look at the full number you looked at the first digit of each number, 1 thru 9, from left to right. Intuitively most people would think that the probability of that number being, 1, 2, 3…..or 9 would be exactly the same. Well, it isn’t. If you look at wide range of statistical tables, such as the prices of stocks in the NYSE, baseball statistics or even numbers in the financial statements of a company, you find that the probability of that first digit being a 1 is 0.301, 2 is 0.176, 3 is 0.124…all the way down to 9 the probability of which is 0.04576.


 


The following is a table taken from here with the probabilities found from taking first digit statistics of the first digit in numbers found in the front page of a newspaper, the 1990 census on county populations and the prices of the stocks in the Dow Jones Industrials from 1990-1993.


 



 


The reason for this is that the populations are evenly distributed on a logarithmic scale and many of these processes are logarithmic. Think of stock prices. If you issue stock at $10 and your company grows 100% every five years, the digit 1 would be the first one of your stock price the first five years, but after that, the digit two will only be part of it less than two and a half years and the length of time will get shorter as the stock price grows. So, if you have hundreds of stocks, you will always observe more first digits with a one than any other number.


 


This turns out to have important consequences in real life testing. Supposedly (haven’t found the reference) the first time someone saw something fishy in Enron’s numbers was because some particular table of number did not fit Benford’s Law.


 


The IRS uses Benford’s law to detect fraud, auditors to detect fraud in companies and companies to detect fraud by employees. The reason is simple, if someone tampers with the data, they will likely spread the numbers uniformly and the probability of a 1 as a first single digit would be as likely as any other number. The same thing happens if people commit fraud; they spread the amounts around evenly thinking that it will not be noticed. Auditing forms apparently have many tests like this for companies’ data such as customer refund tables and account receivables.


 


You can extend the calculation for the single digit to the first two digits and you can calculate those in that case too. 


 


What I understood today is that what Pericchi et al. have done is to apply Benford’s law to the election results, looking at the total votes at each “cuaderno” level. Reportedly, and I will report the details on it when I hear their talk next Thursday, they have found that the machine results do not fit Benford’s law at all, while the manual ones fit it quite well.