Skip to main content

Benford's Law: How Tax Frauds are caught...

If you list all the countries in the world and their populations, 27% of the numbers will start with the digit 1. Only 3% of them will start with the digit 9. Something very similar holds if you look at the heights of the 60 tallest structures in the world — whether you measure in meters or in feet. 

The first Digit Phenomenon
This phenomenon — called Benford's Law —helps auditors detect fraud in things like taxes and elections, but it also connects up in striking ways to modern physics and mathematics (e.g., power laws in statistical distributions, as well as ergodic theory).

Benford's Law often strikes people as unintuitive because it seems that every digit should have an equal opportunity to start country populations or heights of skyscrapers, like this:


Normal Human Perception
(The delightful figures are from http://www.thecleverest.com/benf...)

This egalitarian intuition about leading digits turns out to be misleading. The situation where every digit is equally likely to start numbers is actually the anomalous one. 

==

Simon Newcomb
The fact that the non-uniform pattern is the common one was named for physicist Frank Benford, who, in 1938, showed that it holds in a wide variety of real lists of numbers (river lengths, molecular weights, street addresses, etc.). But the fact was first discovered in 1881 by Simon Newcomb. He noticed it while thumbing through logarithm books -- tables used at that time by scientists to do arithmetic with large numbers. Newcomb became intrigued by the fact that the pages listing numbers starting with 1 were far more worn than the other pages. This would not happen if every digit occurred equally often as a first digit in the numbers scientists worked with.

(Newcomb was a remarkable polymath. Despite having little formal education, he made an early, quite accurate measurement of the speed of light and was the first to enunciate the fundamental equation of exchange in economics.)

==

The reason Benford's Law is useful in fraud detection is that most fraudsters, in the process of making up numbers, do not pay attention to the pattern of first digits that shows up in organic data sets [6]. The leading digits in large spreadsheets of legitimate financial numbers (light green in the figure below) tend to be very close to Benford's Law (blue), while ones filled in by guessing randomly look way off (orange), and fraudulent numbers (red) tend to look even more bizarre. When tax sleuths notice these tell-tale "unnatural" patterns in data sets, they call people in for a human audit.

Tax Data and Benford's Law
What are the fraudsters missing? To get a sense of why the uniform distribution isn't so natural, we can reason as follows.

  • First, observe that if you multiply a number by 2, then very often the first digit of the result will be 1. Certainly if the original number started with 5, 6, 7, 8 or 9. So if you begin with the intuitively appealing uniform distribution of leading digits (every leading digit being equally likely) and then multiply all the numbers by 2, the distribution of leading digits will no longer be uniform — there will now be a lot of leading 1's. 

    (To describe this phenomenon, I say that multiplication by 2 privileges 1 as a leading digit.)

    This simple observation already tells you that the uniform distribution of leading digits is not really very stable. It doesn't like to persist. It is easy to upset by the innocuous operation of multiplying everything by 2, which is difficult to avoid in the wild!
  • Second, it turns out that many naturally occurring tables of numbers can bethought of as arising from taking some original list and multiplying each entry by a random number of twos.

In view of this, it is natural that we see lower digits overrepresented, and higher digits under-represented, in many naturally occurring data sets.

To explore the explanation in more depth, let's focus on the example of country populations. These tend to grow over time. Think of growing as starting from a random size and being multiplied by 2 a (random) number of times, different for each country (depending on growth rate). Since multiplication by 2 privileges the digit 1 as a leading digit, it's not surprising that a lot of the final numbers start with ones. More than start with nines.

(By the way, it's not just multiplication by 2 that privileges 1 as a leading digit. Multiplication by most numbers privileges lower initial digits, in a sense that is made precise below. So does division by most numbers.) 

Maybe the way to think about it is this. To get a list of numbers not to satisfy Benford's Law you need to build it that way (say, by writing down a list of 6-digit numbers and rolling a 10-sided die to pick all the digits). And then you need to make sure no creature comes along after you are done and multiplies all of them by something a bit unpredictable. But actually, it's very hard to exclude such a creature, because sometimes it is nature (as with population growth) and sometimes it is another source of unpredictable proportional change. And those idiosyncratic multiplications (or divisions) typically privilege lower initial digits.

==

This explains the qualitative phenomenon that 1 appears as a leading digit more often than 9 does. But what explains the quantitative Benford's Law distribution? 

That is, why do we expect to see that about 30% of numbers start with 1, while 10% of numbers have a leading 4, and only 5% of numbers start with 9? Where do those percentages come from?

We saw above that the uniform distribution of leading digits — an 11% probability for each potential leading digit — is not stable when you multiply all the numbers by 2. If every leading digit starts out being equally represented, that stops being true after you multiply by 2.

It turns out there is a distribution of leading digits that does not get upset after multiplying by 2 in this way — it remains stable.  That special distribution is precisely the Benford's Law distribution in the first figure in this answer. And that's not just true for multiplication by 2 — the distribution is stable when you multiply by any number between 1 and 10.  The Benford's Law distribution is the only one that has this property, and once you know that, it is easy to work out what it has to be.

For more detail on this and many other mathematical facts about Benford's Law, see the beautiful blog post by Terry Tao at http://terrytao.wordpress.com/20..., as well as a presentation (slides only) by Michelle Manes athttp://www.math.hawaii.edu/home/....

==
For more info, check out:
[1] The Effective Use of Benford's Law to Detect Fraud in Accounting Data,http://dbentrance.com/blog/?p=112
[2] http://www.math.hawaii.edu/home/...
[3] http://www.rexswain.com/benford....
Post a Comment

Popular posts from this blog

What is OOP? In Layman terms...

What is OOP? OOP stands for Object Oriented Programming. It is not just a programming language, but a paradigm (An example or model used to explain a concept or theory). OOP does not tell you how to program, rather it tells you how to go about designing your software. There are many languages that implement/help you in implementing OOP. C++ is one of them. When you are developing Object-oriented programs/software the emphasis is more on how you think about and design the software rather that on actually implementing(coding) it. Why ‘Object-Oriented’?       What should you do to make sure your program is object oriented? Simple, you stop thinking in terms of Bits, Bytes, Pointers, unions, structures, et al. Instead you think in terms of objects and the interactions between them. It is actually more natural for humans to think Object-oriented than to think in terms of memory and variables, etc.       Simply put, OO is nothing but defining the behavior of software as a collecti…

Shridharacharya: Solving Quadratic equations in the 9th Century.

SridharAcharya (c. 870, India – c. 930 India) was an Indianmathematician, Sanskrit pundit andphilosopher. He was born in Bhurishresti (Bhurisristi or Bhurshut) village in South Radha (at present Hughli) in the 10th Century AD.He was known for two treatises: Trisatika(sometimes called the Patiganitasara) and thePatiganita. His major work Patiganitasara was named Trisatika because it was written in three hundred slokas. The book discusses counting of numbers, measures, natural number, multiplication, division, zero, squares, cubes, fraction, rule of three, interest-calculation, joint business or partnership and mensuration.He was one of the first to give a formula for solving quadratic equations.He found the formula :- (Multiply by 4a)
Proof of the Sridhar Acharya Formula,let us consider,Multipling both sides by 4a,Substracting  from both sides,Then adding  to both sides,We know that,

BLACK...Hindi Cinema's Coming of Age Movie....

Went to See BLACK this Saturday with my Friends.... It was a really unforgettable Experience...BLACK is Unlike any HIndi Movie ever-made. It is Stark, Thought Provoking, Rough, Beautiful, Poetic, Gossamer, and Tough at the same time. Very Few Movie-Makers are able to attain this level of Movie Making. Hats off to Sanjay Leela Bhansali. Black stars Amitabh Bachchan and Rani Mukerji. Bacchan, in one of his best roles, plays the TEACHER and Portrays both Unsurmountable Weakness and Indomitable Strength with equal Ease. Rani(the Girl Next Door) portrays the role of a Deaf and Blind Girl with such conviction, that the viewer cannot help but get affected with her plight. Ayesha Kapur plays the young Rani -- a blind, deaf and hence mute girl -- to perfection. One can just wonder at the performance by a 10 year old little girl. Frustrated by her inability to comprehend the world around her, little Michelle McNally is an untamed 'animal'. Her Father keeps a distance, her mother is to…

Adsense