Lies, Damned Lies and Statistics

There are three kinds of lies: lies, damned lies, and statistics.

-Mark Twain

I have a confession to make. Many, many moons ago, my roommates and I were living in a perpetual state of poverty and were looking to make some cash. A friend of ours came across a table of statistics used for some casino games (particularly craps) and so we decided to host our own (quite illegal) casino in our apartment.

We designed the craps table ourselves and also had a table where we played poker. We invited friends over, who could either choose to pay us a fee to enter the party (2 USD) or else promise to gamble at least 2 USD at either poker or the craps table. In return, we provided a keg of beer and a good time.

We certainly weren’t smart enough to calculate the odds on different outcomes for different dice throws (around which the game craps is based) but our friend had a book which had them laid out for us. Just like any regular casino from London to Las Vegas, we knew that the odds were in favor of the “house” (us) and that even if a few of our friends won some money, at the end of the night we we would come out ahead.

And indeed we did profit – and quite nicely to boot. I believe my personal cut of the profit at the end of each of our casino nights was close to 100 dollars. Some friends did, yes, win some money and come out ahead. But the table of predicted outcomes for each dice throw (for craps – poker is a separate game involving skill) were accurate.

Why does any of this matter? And what does this have to do with Romania, in particular?

I’ve noticed that many people confuse calculating probability (or “the odds”) with statistics, and sometimes have difficulty separating the two. One of them is easy to do with a calculator and the other is difficult to the point of being impossible.

Assuming we were using “fair” dice (i.e. there is an equal chance any of the six sides of the die will land facing up), then using a simple equation it’s very easy to calculate future dice rolls. In other words, with one die, we can accurately predict that any given dice throw will have a 1 in 6 chance of landing on the number 6. If our craps table used only one die (normally it uses 2) then we could cap a payoff and say that anyone who rolls a 6 will be paid six times their original bet.

One individual player might roll two 6’s in a row (costing the house big time) but over the course of the night, many, many other players will roll everything BUT a 6 and with these payouts, the house will come out “even”, neither losing money nor making any.

Games like roulette and craps use fixed systems to play (and no skill is involved) and therefore calculating probability (or “the odds”) are very easy to do. Therefore we (and every other casino in the world) can modify the payouts based on probability and be ensured that we will always profit in the end, even if some individual players take more money from us than risked on the table.

Statistics is the same process but reversed, which is a huge difference.

Let’s imagine that we wanted to do a survey of every resident of Cluj to see how they view the mayor. We ask them a question “Do you think the mayor is doing a good job?” with the only two possible answers being “yes” or “no”.

Because surveying all 300,000 residents would be a difficult and expensive task, we poll only 300 residents. Hypothetically, exactly half (150 people) say “yes” the mayor is doing a good job and half (150 people) say “no”, he isn’t doing a good job. Therefore, per our survey, the mayor of Cluj has a 50% approval rating.

This is statistics. And our job is to look at this sample (300 people) and calculate just how accurately it reflects the probability of it representing all the people in Cluj. In other words, what if there is an entire neighborhood of people who hate the mayor and we managed to miss surveying them (or likewise, a large pocket of mayor supporters)? Statistics is the use of known data (the people we surveyed) and applying certain mathematical formulas to retroactively determine probability that those data reflect “reality” (i.e. all 300,000 residents of Cluj). For more on this, see the margin of error.

For our craps game, we were calculating the probability of a future event (dice throws our friends were going to make) and statistics would be recording every dice throw actually made during the night (when it was over) and retroactively using them to calculate the probabilities of achieving certain dice throw combinations (i.e. the chance of rolling two 6’s, for instance).

Assuming we had enough dice throws (data points) and our dice were indeed fair, then we can come to a rather accurate assessment of probabilities (that a 6 comes up in about 1 in 6 throws, for instance).

All of the above is relatively straightforward. The way statistics become “lies” is when they are used to calculate future probabilities.

Let’s assume we use basic knowledge of statistics (margin of error, etc) and feel very confident that the mayor does indeed have (approximately) a 50% approval rating. This information is quite useful for the time interval of our survey (“today”) but tells us nothing about tomorrow. Tomorrow the mayor might be caught doing something scandalous and his approval rating will plunge.

Therefore even if the mayor (or the president of Egypt, for instance) has been popular for a long time, that approval rating could suddenly plunge (or rise), sometimes very quickly, and all of our past data and statistics on his approval rating were completely useless in predicting the future.

So why were we – three mathematically illiterate Americans – able to so accurately the future probabilities of our friends’ dice throws and yet professional mathematicians and statisticians unable to predict the future approval rating of a political leader?

The answer is the ludic fallacy. We knew that no matter what crazy stuff our friends did, they could never roll a 15 (with a single die) or a 5,000. There was a known limit on what could happen (i.e. only one of the six die faces would be rolled). It was a “closed set”. We couldn’t predict what particular numbers would be rolled, but we could predict what they wouldn’t be (higher than 6 per die or less than 1).

In real life, the (legal) casino known as The Mirage in Las Vegas has never lost money from gamblers betting on its craps tables. The dice there can only have fixed possible outcomes and therefore the odds can be calculated ahead of time. However The Mirage lost hundreds of millions of dollars in 2003 when a tiger in the popular Siefried and Roy show mauled Roy, partly paralyzing him.

From Wikipedia:

An MGM Mirage spokesman said losing Siegfried & Roy is a bigger hit to the Mirage brand than to its finances, because the entertainers are “practically the faces” of the hotel, and finding a new hotel brand or identity will be difficult.

Indeed. Calculating the odds on dice throws = possible. Calculating the odds on a tiger suddenly mauling its handler, with whom it had worked and lived with (and even slept in his bedroom at night!) all of its life = impossible to calculate.

Siegfried and Roy had done 5,750 shows together with tigers and had been an enormously profitable act in Vegas for decades. People far smarter than I at The Mirage had signed the act to a “lifetime contract” and built their entire image around them. And yet all the statistics (5,750 shows) in the world could not predict or calculate the probability that suddenly one of the tigers would put an end to the show and cripple one of the main performers.

Again, what does all of this have to do with Romania? Well when the government borrows billions of euros, it has to have in place a plan to predict the future ability to pay back those loans. I am quite confident that armies of statisticians went through every sector of the Romanian economy, looking at past performance and then using these data to calculate the future probability of income in order to GET the loan.

And yet, as it should be obvious, these future predictions are useless. A “Siegfried and Roy” moment could happen at any time. An accidental gas leak could blow up a major factory. Or likewise, some western celebrity might endorse driving Dacia cars and then suddenly everyone wants to buy one, driving up demand. And so on and so forth.

There are several terms people have coined to describe realms wherein calculations of future probability are either possible or else (effectively) impossible. Let’s look at something that is possible to predict:

For example, if someone went out and measured the height of every Romanian in the country and it came to an average of 1.7 meters (just making up a number here) then yes it would be possible to predict (with a high level of probability) that next year the average height of all Romanians will also be around 1.7 meters.

And yet if you (accurately) calculated that the average income for a Romanian was 1000 lei per month and used this to predict that next year’s average income would also be around 1000 lei per month, you might be extremely wrong. The economy could suffer some enormous hit and incomes could drop radically. Or else a large deposit of petroleum reserves could be discovered and billions of euros worth of contracts signed and suddenly the average income could rise dramatically.

The difference between these two is scalability. There are biological factors preventing people from suddenly growing 3 meters tall (or shrinking to 1 meter). Therefore it is not scalable. It’s like our craps table – there can’t be any sudden shifts in values (i.e. two dice together can’t add up to 15).

But there are no limits on income as it is scalable. If half the population were thrown out of work, then their effective income could fall to zero. Or if some sudden valuable resource were discovered, incomes could rise dramatically.

I started out tracking this “witch tax” story because it offended me that people were taking cheap shots at Romania. But when I saw the ludicrous legislation attempting to fine “witches” who “guessed incorrectly”, I began to wonder who exactly is driving the ship of state around here.

An old lady in some remote village throwing cards or reading palms to “predict the future” is a relatively harmless affair. But when serious, sober-faced men and women in suits in Bucharest pompously pretend to predict the future of Romania’s economy, and take billions of dollars in euros to gamble on those predictions, we could all be in a lot of trouble.

For all of Ceausescu’s faults (of which there are many), he did at least theoretically show this country the fallacy of making these kinds of predictions. Angered at Kruschev’s determination that Romania should remain largely agricultural, Ceausescu turned to the IMF and western bankers to borrow billions of dollars. Why? Some of the money was stolen but most of it went to modernize this country – build factories and other heavy industry.

He then forced Romanians into a condition of semi-starvation to pay back those loans and – unbelievably – succeeded in doing so. But what happened to all the factories? What happened to all the modernization? We all know what happened. The technology was substandard and completely incapable of competing in the post-1989 world. Most of them are now shut down. All of that money, in essence, was completed wasted. And all of those cold, hungry years in the 1980’s were suffered in vain. Despite being debt free on the day Ceausescu was killed, Romania was left in a worse situation than if they had never “modernized” at all.

And here we are doing it all over again, borrowing billions of euros to “modernize” and “improve” the economy. Ceausescu is dead but the bankers are the same ones he borrowed from. And yet all the same sober, serious-faced bureaucrats and politicians want us to believe that this time they know what they’re doing, that this time the “right” investments and modernizations will be put into effect, that this time Romania will take a “Great Leap Forward” and the economy will grow and incomes will rise and soon we’ll all be fiddling with our iPhones from the back of our Mercedes limousines.

I wonder when Constantin “The Genius from Dolj” Dascalu is going to write a bill punishing these technocrats if their “predictions of the future” fail to come true, especially since their theories of calculating future probabilities are largely based on their “theories, beliefs and empirical experiences”, otherwise known as lies, damned lies and statistics.

One can only hope ;)

6 thoughts on “Lies, Damned Lies and Statistics”

Pingback: The Foot of the Ladder « I'm More Romanian Than You!
Hadrian says:

August 12, 2011 at 00:57

Hi, there, Sam:).
You made me laugh, reading about your romanian language lessons.
I really liked to read your stories.
A, well, i found some funny informations like: Peles-house/home of Enescu, castle made in feudal times ( the royal house made it with their own money, from Germany )
OK, but these are nothings (probably you should find someone to read and gives you few more exactly informations blabla :)…
Anyway, you have talent and i really enjoy your “umor” and epic journey , kind of life journey, in Romania. I really appreciate your positive energy etc etc :)
You are a good friend. Salut.

Forget my “english”; it’s almost 2 am..however..:))

LikeLike

Alina says:

February 17, 2011 at 11:25

Nice post, very enlightening, and no, it’s not too long :) I finally understood statistics :P

LikeLike

androxa says:

February 13, 2011 at 16:55

Well ..I just “discovered” your blog a few days ago and I enjoy it very much ..its a very refreshing look at romania :-)

LikeLike

Caroline says:

February 13, 2011 at 02:26

I enjoy your posts, but they are too long and too packed with details that don’t matter. I am interested in obtaining your book, but hope that it will be more hard-hitting and pithy.

Being blunt here. You are a smart, productive person and I know you can take criticism.

LikeLike

1. Sam R. says:
  
  February 13, 2011 at 08:26
  
  haha yes I can take criticism :) I appreciate your thoughts but I got to write what’s in my head. Today could be statistics, tomorrow could be cat pictures! :D
  
  LikeLike