This microbook is a summary/original review based on the book: Super Crunchers: Why Thinking-By-Numbers is the New Way To Be Smart
Available for: Read online, read in our mobile apps for iPhone/Android and send in PDF/EPUB/MOBI to Amazon Kindle.
Also available in audiobook
Until just a few decades ago, most decisions were based either on experience or intuition. Experts were nothing more but gray-haired people whose decades of individual trial-and-error practice allowed them to make some faster and more accurate decisions. How have things changed!
With the availability of large databases and recent improvements in deep learning methodology, experiential expertise is quickly becoming a thing of the past. The experts of the 21st century – dubbed “Super Crunchers” in the title of Ian Ayres’ engrossing 2008 business bestseller – will never again need to rely on their know-how, familiarity with a certain subject or provocative journal articles. They have already started numerous revolutions across several industries by merely crunching the numbers – and discovering associations never before put forward, let alone studied in detail.
So, get ready to learn how algorithms and custom formulas are changing the world around you and how they have already made it possible for some people to predict the future.
“When you buy a good red wine,” says Orley Ashenfelter, a professor of economics at Princeton University, “you're always making an investment, in the sense that it's probably going to be better later. And what you'd like to know is not what it's worth now, but what it's going to be worth in the future.”
And that’s what wine critics such as Robert M. Parker Jr. are for: his 100-point wine rating scale and his newsletter “The Wine Advocate” are so influential that they are considered one of the major factors in setting the prices for Bordeaux wines in the United States. Parker is not the only one to think that it is literally impossible to evaluate these wines properly without sufficient know-how and before just enough time has passed. According to Bruce Kaiser, former director of the wine department at auctioneer, not even the best sommelier can do that until the wine is “at least ten years old, perhaps older.”
Well, Ashenfelter begs to differ – he believes to have devised a formula that may allow one to predict the quality of Bordeaux wines immediately, even before production. Since the production process is virtually the same, is there any objectivity in what Parker does? Being one of the most respected quantitative economists in the world, it wasn’t that difficult for him to realize that there must be some connection between the quality of certain Bordeaux’s and the region’s weather patterns the year the wines were produced.
Comparing vast amounts of statistical data from 1952 to 1980, Ashenfelter discovered that the greatest wines are probably the result of low levels of harvest rain and high average summer temperatures. He had the temerity to translate this into a single formula:
Wine quality = 12.145 + 0.00117 winter rainfall + 0.0614 average growing season temperature – 0.00386 harvest rainfall
That’s right: all Ashenfelter needs to do to predict the general quality of any vintage is to plug weather statistics for any year into this formula. “It may seem a bit mathematical,” he says, “but this is exactly the way the French ranked their vineyards back in the famous 1855 classifications.”
Though traditional wine experts such as Parker scoffed away Ashenfelter’s formula when it first appeared more than a decade ago, wine dealers immediately saw its usefulness and have used it quite often to determine whether a Bordeaux is a good investment or not. And year after year, Ashenfelter’s formula has proven to be remarkably on the mark.
The reason why a quantitative economist was able to make wine tasters obsolete with a single formula can be summed up in two words: large datasets. And we really do mean large: they are not measured in gigabytes or even terabytes anymore, with some business and government databases measuring even several petabytes (1,000 terabytes)!
With so much data available – and with the help of supercomputers and the ever-evolving AI – economists and statisticians are now able to detect previously unimagined associations, and they can achieve this in no more than a few days. The implications, however, are usually revolutionary and often result in making several professions and countless studies obsolete basically overnight.
This is what Ayres dubs “Super Crunching,” which he defines briefly as “statistical analysis that impacts real-world decisions” and describes, at some length, as humanity’s first step toward a better, more evidence-based future. You’re already living in it: thanks to large datasets and super crunching that YouTube and Netflix know which videos you’d probably enjoy even before yourself. From the benefits of this approach, sabermetricians – that is, empirical analysis of baseball – are part of every baseball franchise nowadays.
One of the main tools of super crunching is regression, which Ashenfelter used. Developed by Charles Darwin’s cousin Francis Galton, regression is “a statistical procedure that takes raw historical data and estimates how various causal factors influence a single variable of interest.” A tool so powerful that is able to determine whether you’re compatible with your romantic interest or not. That’s how the online dating site eHarmony matches people. Its founder and driving force, Neil Clark Warren, started the site after studying over 5,000 married couples in the late 1990s.
Based on their happiness, Warren devised “a predictive statistical model of compatibility based on 29 different variables related to a person’s emotional temperament, social style, cognitive mode, and relationship skills.” Thanks to a proprietary questionnaire, each eHarmony member is obliged to fill in when signing up, the site’s algorithm cannot only detect what kind of person you are but also to match you with someone whose traits have proven compatible in most past scenarios.
Both Ashenfelter’s formula and eHarmony’s algorithm try to predict the future using historical data. However, in their cases, there is a single variable of interest – whether the wine is good, and whether two people are compatible – and a limited number of causal factors. Unfortunately, when you rely on historical data, it is not always easy to “tease out causation.”
For example, if you want to find out whether chemotherapy works better than radiation, 5,000 cancer cases might not be enough – mainly because of the highly unpredictable and habitually uncontrolled environment of past studies. Smoking wasn’t always in the health-hazards list, how can you know which patients were smokers? Or, when the Food and Drug Administration (FDA) wasn’t as rigorous as today, how would you know if there were patients allergic to ingredients used in chemotherapy?
Fortunately, as early as 1925, Ronald Fisher – the greatest of Darwin’s successors and the father of modern statistics – realized that there might be a better way to test whether particular medical interventions have some predicted effect: randomized testing of large datasets.
The mechanism is quite simple: if you randomly assigned all those cancer patients in two groups, it is statistically quite probable that roughly the same number of smokers would wind up in either group. Fisher was ahead of his time, so it would take official institutions and medical facilities decades to introduce a randomized trial on humans.
However, the very first of its kind – which attempted to understand the effect of an early antibiotic against tuberculosis – was successful enough to start a medical revolution. Today, with the encouragement of the FDA, “randomized tests have become the gold standard for proving whether or not medical treatments are efficacious.”
Even more: thanks to the advancements in technology, randomized tests are now much regularly used by the leaders in the business and the financial world and even by NGOs and governments. For example, it was a randomized test of 600,000 prospects that allowed Capital One – a pioneering super cruncher – to discover that offering a teaser rate of 4.9% for six months would profit the financial corporation much more than offering a 7.9% for 12 months. Also, some of the most ambitious policy-instigating social experiments of the past decade – such as U.S. Department of Housing and Urban Development’s “Moving to Opportunity” and MIT’s “Poverty Action Lab” – have used the power of randomized trials to see how we can help poor people find apartments or developing countries fight poverty.
Regressions and randomized trials are the two most fundamental statistical techniques. However, even though they have been theoretically around for more than a century, only now, with the advent of big data and artificial intelligence, we can truly experience how powerful they are and why “the art of quantitative prediction is reshaping business and government.” Simply put, what we call intuition and expertise is nothing more but a truncated and less accurate version of what supercomputers do with large datasets: finding associations and links based on experience. However, since humans are biologically incapable of processing these enormous amounts of data, does that mean that experts are a thing of the past?
In a way, it does. When chess grandmaster Garry Kasparov lost to the Deep Blue computer, it wasn’t because of its speed or because of IBM’s extremely smart software. He lost because of the computer’s ability to access immediately a database of 700,000 grandmaster games and accurately calculate the power of different positions before making any move. A champion’s expertise and experience-based intuition lost out to data-based decision making – and, whether in chess, baseball, wine tasting, or medical prognosis, this has been happening all the time and all around the world ever since.
Long before this became so obvious, way back in 1954, a psychologist named Paul Meehl published a “disturbing little book” titled “Clinical Versus Statistical Prediction,” which compared data from about 20 other empirical studies on the subject. The book revealed to the world that statistical equations routinely out predict human experts. Meehl spent the rest of his life making similar comparisons. Just before he died, together with Minnesota protégé William Grove, he completed a “meta” analysis of 136 of these man-versus-machine studies that demonstrably proved that statistical prediction “decisively outperforms” expert predictions: in only 8 of 136 studies the experts were found to be appreciably more accurate than the equations.
It may sound bleak, but there is a big chance that, regardless of their background, most human experts of the future will be number crunchers first and foremost. Apparently, in a world of large datasets, you can’t allow yourself to be statistically illiterate anymore – even if you are a wine taster or a writer. Yes, you heard that right: super crunching has been quietly changing even the creative industry!
And the name of the man responsible for that is Dick Copaken, a former Washington lawyer, who founded a company named Epagogix soon after retiring, to train a neural network capable of predicting a movie's success at the box office based primarily on the qualities of its script. In 2006, Malcolm Gladwell broke the story in one of his celebrated The New Yorker articles, and the world soon discovered that Epagogix had already worked with several Hollywood studios in secret.
In fact, its script-based revenue-predicting formula had already been rigorously tested: in “a paradigm-shifting experiment,” it successfully predicted the profitability of 6 out of 9 films – twice as good as traditional Hollywood experts. And bear in mind that the neural network developed by Epagogix analyzed only scripts, and accurately predicted the movies’ gross revenues long before the stars or the directors had even been chosen!
In the past, Hollywood executives used the phrase “to shoot a turkey” to mean that they’ve produced a box office bomb. Epagogix has reversed the meaning of that phrase by 180 degrees: using super crunching, they are now able to quite literally shoot turkeys and prevent bad pictures from ever coming into existence.
Steven D. Levitt, University of Chicago economist and one of the co-authors of “Freakonomics,” describes “Super Crunchers” as a groundbreaking book that is not only fun to read, but can change the way you think as well.
Believe it or not, that’s an understatement: “Super Crunchers” will definitely change the way you think and will probably make you wonder whether we could build a better future for everybody if, instead of making decisions ourselves, we just started providing inputs to computer algorithms and let them decide instead of us.
We know that sounds dystopian. But somehow, “Super Crunchers” makes it attractive.
Even though we’ve evolved to trust our intuition, it seems that we would be much better off putting our faith in numbers. No, scratch that: it’s not a question of faith in the second case. Numbers, as they say, don’t lie: they have just started telling bigger truths now that we can crunch so many of them in such short periods. The future belongs to large datasets, so statistical literacy isn’t just a bullet point in your résumé anymore, but a survival tool. Especially if you are an entrepreneur.
Ian Ayres is an American lawyer and economist. He is a professor at both Yale’s Law School and School of Management, and a fellow of the American Academy of Art and Science. He is also a regular c... (Read more)
Now you can! Start a free trial and gain access to the knowledge of the biggest non-fiction bestsellers.