There was a seminar today at Simon Bolivar University (USB), the leading technical university in Venezuela, on mathematical studies of the recall vote. The event, which was also sponsored by Universidad Central de Venezuela (UCV) was quite interesting. I was planning to write a full report, but unfortunately (for you) maybe fortunately (for me) I forgot my notes at my office and if I want to speak with precision, I need them.

Perhaps the most interesting part is the effect this is having on the academic community. You have a bunch of mathematicians and physicists applying the tools of their academic and research trade to a real life problem. Additionally, many people are working on the same problem so there is a lively and daily exchange of ideas. This is good for Venezuelan science, independent of the final results.

The problem is being look at from a variety of different angles that go from very pedestrian statistical analysis to sublime techniques and I am sure, soon some may get into using divine ones that I will never be able to understand. Speakers were very careful in not using the word ¨fraud¨, concentrating on ¨probability¨,¨ likelihood¨ and other such terms.

The first talk was given by Isabel Llatas and it was an overview of the work that is being done or has been done so far. I counted 24 different names of scientists here or abroad looking at the problem from different angles.

Llatas showed partial results from the work of Sanso and Prado, which I have posted here, from that of Isbelia Martin, which I posted two nights ago as well as from that of Luis Raul Pericchi who has been using Benford´s Law to study the results of the referendum vote. Pericchi will speak in the second one of these seminars next Thursday, but I found the work very interesting and will mention it later in this post.

Llatas showed how people have looked at the available CNE data in many different forms, separating it into data which was counted electronically and manually, as well as geographical distributions. What came across from the talk is that there is a lot of work that has already been done in the last three weeks with the available data and scientists are still working on things, making sure they are right, before publishing it or talking about it.

After this, came two talks which I will dwell on in detail later. The first one was by a group of engineers that have looked at the statistical properties of the votes at the center and parish level, finding what they call “irregular” results at a significant number of machines. The second talk was by Raul Jimenez et al. who have been looking at the problem of coincidences and has some interesting formal and practical results, which suggest the coincidences are quite unlikely. One of his most surprising statements was that there are also coincidences in the total number of votes per machine SI’s+NO’s and he has found that these coincidences have the lowest probability of occurring, with a number like a probability being one in a million.

Before today I had heard of Pericchi’s work, but had no idea what that was about until I saw a graph of his results and decided to look into the background. (I have no more details than what I will give at the end of this post). His work is based on Benford’s Law, a concept that now that I know about it, I have to wonder how I could have lived all these years without it!

Imagine you have a table of populations of towns and cities for a given country. These numbers are distributed according to a probability distribution with a mean and a standard deviation. But suppose that rather than look at the full number you looked at the first digit of each number, 1 thru 9, from left to right. Intuitively most people would think that the probability of that number being, 1, 2, 3…..or 9 would be exactly the same. Well, it isn’t. If you look at wide range of statistical tables, such as the prices of stocks in the NYSE, baseball statistics or even numbers in the financial statements of a company, you find that the probability of that first digit being a 1 is 0.301, 2 is 0.176, 3 is 0.124…all the way down to 9 the probability of which is 0.04576.

The following is a table taken from here with the probabilities found from taking first digit statistics of the first digit in numbers found in the front page of a newspaper, the 1990 census on county populations and the prices of the stocks in the Dow Jones Industrials from 1990-1993.

The reason for this is that the populations are evenly distributed on a logarithmic scale and many of these processes are logarithmic. Think of stock prices. If you issue stock at $10 and your company grows 100% every five years, the digit 1 would be the first one of your stock price the first five years, but after that, the digit two will only be part of it less than two and a half years and the length of time will get shorter as the stock price grows. So, if you have hundreds of stocks, you will always observe more first digits with a one than any other number.

This turns out to have important consequences in real life testing. Supposedly (haven’t found the reference) the first time someone saw something fishy in Enron’s numbers was because some particular table of number did not fit Benford’s Law.

The IRS uses Benford’s law to detect fraud, auditors to detect fraud in companies and companies to detect fraud by employees. The reason is simple, if someone tampers with the data, they will likely spread the numbers uniformly and the probability of a 1 as a first single digit would be as likely as any other number. The same thing happens if people commit fraud; they spread the amounts around evenly thinking that it will not be noticed. Auditing forms apparently have many tests like this for companies’ data such as customer refund tables and account receivables.

You can extend the calculation for the single digit to the first two digits and you can calculate those in that case too.

What I understood today is that what Pericchi et al. have done is to apply Benford’s law to the election results, looking at the total votes at each “cuaderno” level. Reportedly, and I will report the details on it when I hear their talk next Thursday, they have found that the machine results do not fit Benford’s law at all, while the manual ones fit it quite well.