Part 3: THOSE GRAY SWANS OF EXTREMISTAN

It’s time to deal in some depth with four final items that bear on our Black Swan.

Primo, I have said earlier that the world is moving deeper into Extremistan, that it is less and less governed by Mediocristan – in fact, this idea is more subtle than that. I will show how and present the various ideas we have about the formation of inequality. Secondo, I have been describing the Gaussian bell curve as a contagious and severe delusion, and it is time to get into that point in some depth. Terso, I will present what I call Mandelbrotian, or fractal, randomness. Remember that for an event to be a Black Swan, it does not just have to be rare, or just wild; it has to be unexpected, has to lie outside our tunnel of possibilities. You must be a sucker for it. As it happens, many rare events can yield their structure to us: it is not easy to compute their probability, but it is easy to get a general idea about the possibility of their occurrence. We can turn these Black Swans into Gray Swans, so to speak, reducing their surprise effect. A person aware of the possibility of such events can come to belong to the non-sucker variety.

Finally, I will present the ideas of those philosophers who focus on phony uncertainty. I organized this book in such a way that the more technical (though nonessential) sections are here; these can be skipped without any loss to the thoughtful reader, particularly Chapters 15, 17, and the second half of Chapter 16. I will alert the reader with footnotes. The reader less interested in the mechanics of deviations can then directly proceed to Part 4.

Chapter Fourteen: FROM MEDIOCRISTAN TO EXTREMISTAN, AND BACK

I prefer Horowitz – How to fall from favor – The long tail – Get ready for some surprises – It’s not just money

Let us see how an increasingly man-made planet can evolve away from mild into wild randomness. First, I describe how we get to Extremistan. Then, I will take a look at its evolution.

The World Is Unfair

Is the world that unfair? I have spent my entire life studying randomness, practicing randomness, hating randomness. The more that time passes, the worse things seem to me, the more scared I get, the more disgusted I am with Mother Nature. The more I think about my subject, the more I see evidence that the world we have in our minds is different from the one playing outside. Every morning the world appears to me more random than it did the day before, and humans seem to be even more fooled by it than they were the previous day. It is becoming unbearable. I find writing these lines painful; I find the world revolting.

Two “soft” scientists propose intuitive models for the development of this inequity: one is a mainstream economist, the other a sociologist. Both simplify a little too much. I will present their ideas because they are easy to understand, not because of the scientific quality of their insights or any consequences in their discoveries; then I will show the story as seen from the vantage point of the natural scientists.

Let me start with the economist Sherwin Rosen. In the early eighties, he wrote papers about “the economics of superstars”. In one of the papers he conveyed his sense of outrage that a basketball player could earn $1.2 million a year, or a television celebrity could make $2 million. To get an idea of how this concentration is increasing – i.e., of how we are moving away from Mediocristan – consider that television celebrities and sports stars (even in Europe) get contracts today, only two decades later, worth in the hundreds of millions of dollars! The extreme is about (so far) twenty times higher than it was two decades ago!

According to Rosen, this inequality comes from a tournament effect: someone who is marginally “better” can easily win the entire pot, leaving the others with nothing. Using an argument from Chapter 3, people prefer to pay $10.99 for a recording featuring Horowitz to $9.99 for a struggling pianist. Would you rather read Kundera for $13.99 or some unknown author for $1? So it looks like a tournament, where the winner grabs the whole thing – and he does not have to win by much.

But the role of luck is missing in Rosen’s beautiful argument. The problem here is the notion of “better”, this focus on skills as leading to success. Random outcomes, or an arbitrary situation, can also explain success, and provide the initial push that leads to a winner-take-all result. A person can get slightly ahead for entirely random reasons; because we like to imitate one another, we will flock to him. The world of contagion is so underestimated!

As I am writing these lines I am using a Macintosh, by Apple, after years of using Microsoft-based products. The Apple technology is vastly better, yet the inferior software won the day. How? Luck.

The Matthew Effect

More than a decade before Rosen, the sociologist of science Robert K. Merton presented his idea of the Matthew effect, by which people take from the poor to give to the rich.[45] He looked at the performance of scientists and showed how an initial advantage follows someone through life. Consider the following process.

Let’s say someone writes an academic paper quoting fifty people who have worked on the subject and provided background materials for his study; assume, for the sake of simplicity, that all fifty are of equal merit. Another researcher working on the exact same subject will randomly cite three of those fifty in his bibliography. Merton showed that many academics cite references without having read the original work; rather, they’ll read a paper and draw their own citations from among its sources. So a third researcher reading the second article selects three of the previously referenced authors for his citations. These three authors will receive cumulatively more and more attention as their names become associated more tightly with the subject at hand. The difference between the winning three and the other members of the original cohort is mostly luck: they were initially chosen not for their greater skill, but simply for the way their names appeared in the prior bibliography. Thanks to their reputations, these successful academics will go on writing papers and their work will be easily accepted for publication. Academic success is partly (but significantly) a lottery.[46]

It is easy to test the effect of reputation. One way would be to find papers that were written by famous scientists, had their authors’ identities changed by mistake, and got rejected. You could verify how many of these rejections were subsequently overturned after the true identities of the authors were established. Note that scholars are judged mostly on how many times their work is referenced in other people’s work, and thus cliques of people who quote one another are formed (it’s an “I quote you, you quote me” type of business).

Eventually, authors who are not often cited will drop out of the game by, say, going to work for the government (if they are of a gentle nature), or for the Mafia, or for a Wall Street firm (if they have a high level of hormones). Those who got a good push in the beginning of their scholarly careers will keep getting persistent cumulative advantages throughout life. It is easier for the rich to get richer, for the famous to become more famous.

In sociology, Matthew effects bear the less literary name “cumulative advantage”. This theory can easily apply to companies, businessmen, actors, writers, and anyone else who benefits from past success. If you get published in The New Yorker because the color of your letterhead attracted the attention of the editor, who was daydreaming of daisies, the resultant reward can follow you for life. More significantly, it will follow others for life. Failure is also cumulative; losers are likely to also lose in the future, even if we don’t take into account the mechanism of demoralization that might exacerbate it and cause additional failure.

Note that art, because of its dependence on word of mouth, is extremely prone to these cumulative-advantage effects. I mentioned clustering in Chapter 1, and how journalism helps perpetuate these clusters. Our opinions about artistic merit are the result of arbitrary contagion even more than our political ideas are. One person writes a book review; another person reads it and writes a commentary that uses the same arguments. Soon you have several hundred reviews that actually sum up in their contents to no more than two or three because there is so much overlap. For an anecdotal example read Fire the Bastards!, whose author, Jack Green, goes systematically through the reviews of William Gaddis’s novel The Recognitions. Green shows clearly how book reviewers anchor on other reviews and reveals powerful mutual influence, even in their wording. This phenomenon is reminiscent of the herding of financial analysts I discussed in Chapter 10.

The advent of the modern media has accelerated these cumulative advantages. The sociologist Pierre Bourdieu noted a link between the increased concentration of success and the globalization of culture and economic life. But I am not trying to play sociologist here, only show that unpredictable elements can play a role in social outcomes.

Merton’s cumulative-advantage idea has a more general precursor, “preferential attachment”, which, reversing the chronology (though not the logic), I will present next. Merton was interested in the social aspect of knowledge, not in the dynamics of social randomness, so his studies were derived separately from research on the dynamics of randomness in more mathematical sciences.

Lingua Franca

The theory of preferential attachment is ubiquitous in its applications: it can explain why city size is from Extremistan, why vocabulary is concentrated among a small number of words, or why bacteria populations can vary hugely in size.

The scientists J. C. Willis and G. U. Yule published a landmark paper in Nature in 1922 called “Some Statistics of Evolution and Geographical Distribution in Plants and Animals, and Their Significance”. Willis and Yule noted the presence in biology of the so-called power laws, atractable versions of the scalable randomness that I discussed in Chapter 3. These power laws (on which more technical information in the following chapters) had been noticed earlier by Vilfredo Pareto, who found that they applied to the distribution of income. Later, Yule presented a simple model showing how power laws can be generated. His point was as follows: Let’s say species split in two at some constant rate, so that new species arise. The richer in species a genus is, the richer it will tend to get, with the same logic as the Mathew effect. Note the following caveat: in Yule’s model the species never die out.

During the 1940s, a Harvard linguist, George Zipf, examined the properties of language and came up with an empirical regularity now known as Zipf’s law, which, of course, is not a law (and if it were, it would not be Zipf’s). It is just another way to think about the process of inequality. The mechanisms he described were as follows: the more you use a word, the less effortful you will find it to use that word again, so you borrow words from your private dictionary in proportion to their past use. This explains why out of the sixty thousand main words in English, only a few hundred constitute the bulk of what is used in writings, and even fewer appear regularly in conversation. Likewise, the more people aggregate in a particular city, the more likely a stranger will be to pick that city as his destination. The big get bigger and the small stay small, or get relatively smaller.

A great illustration of preferential attachment can be seen in the mushrooming use of English as a lingua franca – though not for its intrinsic qualities, but because people need to use one single language, or stick to one as much as possible, when they are having a conversation. So whatever language appears to have the upper hand will suddenly draw people in droves; its usage will spread like an epidemic, and other languages will be rapidly dislodged. I am often amazed to listen to conversations between people from two neighboring countries, say, between a Turk and an Iranian, or a Lebanese and a Cypriot, communicating in bad English, moving their hands for emphasis, searching for these words that come out of their throats at the cost of great physical effort. Even members of the Swiss Army use English (not French) as a lingua franca (it would be fun to listen). Consider that a very small minority of Americans of northern European descent is from England; traditionally the preponderant ethnic groups are of German, Irish, Dutch, French, and other northern European extraction. Yet because all these groups now use English as their main tongue, they have to study the roots of their adoptive tongue and develop a cultural association with parts of a particular wet island, along with its history, its traditions, and its customs!

Ideas and Contagions

The same model can be used for the contagions and concentration of ideas. But there are some restrictions on the nature of epidemics I must discuss here. Ideas do not spread without some form of structure. Recall the discussion in Chapter 4 about how we come prepared to make inferences. Just as we tend to generalize some matters but not others, so there seem to be “basins of attraction” directing us to certain beliefs. Some ideas will prove contagious, but not others; some forms of superstitions will spread, but not others; some types of religious beliefs will dominate, but not others. The anthropologist, cognitive scientist, and philosopher Dan Sperber has proposed the following idea on the epidemiology of representations. What people call “memes”, ideas that spread and that compete with one another using people as carriers, are not truly like genes. Ideas spread because, alas, they have for carriers self-serving agents who are interested in them, and interested in distorting them in the replication process. You do not make a cake for the sake of merely replicating a recipe – you try to make your own cake, using ideas from others to improve it. We humans are not photocopiers. So contagious mental categories must be those in which we are prepared to believe, perhaps even programmed to believe. To be contagious, a mental category must agree with our nature.

NOBODY IS SAFE IN EXTREMISTAN

There is something extremely naïve about all these models of the dynamics of concentration I’ve presented so far, particularly the socioeconomic ones. For instance, although Merton’s idea includes luck, it misses an additional layer of randomness. In all these models the winner stays a winner. Now, a loser might always remain a loser, but a winner could be unseated by someone new popping up out of nowhere. Nobody is safe.

Preferential-attachment theories are intuitively appealing, but they do not account for the possibility of being supplanted by newcomers – what every schoolchild knows as the decline of civilizations. Consider the logic of cities: How did Rome, with a population of 1.2 million in the first century A.D., end up with a population of twelve thousand in the third? How did Baltimore, once a principal American city, become a relic? And how did Philadelphia come to be overshadowed by New York?

A Brooklyn Frenchman

When I started trading foreign exchange, I befriended a fellow named Vincent who exactly resembled a Brooklyn trader, down to the mannerisms of Fat Tony, except that he spoke the French version of Brooklynese. Vincent taught me a few tricks. Among his sayings were “Trading may have princes, but nobody stays a king” and “The people you meet on the way up, you will meet again on the way down”.

There were theories when I was a child about class warfare and struggles by innocent individuals against powerful monster-corporations capable of swallowing the world. Anyone with intellectual hunger was fed these theories, which were inherited from the Marxist belief that the tools of exploitation were self-feeding, that the powerful would grow more and more powerful, furthering the unfairness of the system. But one had only to look around to see that these large corporate monsters dropped like flies. Take a cross section of the dominant corporations at any particular time; many of them will be out of business a few decades later, while firms nobody ever heard of will have popped onto the scene from some garage in California or from some college dorm.

Consider the following sobering statistic. Of the five hundred largest U.S. companies in 1957, only seventy-four were still part of that select group, the Standard and Poor’s 500, forty years later. Only a few had disappeared in mergers; the rest either shrank or went bust.

Interestingly, almost all these large corporations were located in the most capitalist country on earth, the United States. The more socialist a country’s orientation, the easier it was for the large corporate monsters to stick around. Why did capitalism (and not socialism) destroy these ogres?

In other words, if you leave companies alone, they tend to get eaten up. Those in favor of economic freedom claim that beastly and greedy corporations pose no threat because competition keeps them in check. What I saw at the Wharton School convinced me that the real reason includes a large share of something else: chance.

But when people discuss chance (which they rarely do), they usually only look at their own luck. The luck of others counts greatly. Another corporation may luck out thanks to a blockbuster product and displace the current winners. Capitalism is, among other things, the revitalization of the world thanks to the opportunity to be lucky. Luck is the grand equalizer, because almost everyone can benefit from it. The socialist governments protected their monsters and, by doing so, killed potential newcomers in the womb.

Everything is transitory. Luck both made and unmade Carthage; it both made and unmade Rome.

I said earlier that randomness is bad, but it is not always so. Luck is far more egalitarian than even intelligence. If people were rewarded strictly according to their abilities, things would still be unfair – people don’t choose their abilities. Randomness has the beneficial effect of reshuffling society’s cards, knocking down the big guy.

In the arts, fads do the same job. A newcomer may benefit from a fad, as followers multiply thanks to a preferential attachment-style epidemic. Then, guess what? He too becomes history. It is quite interesting to look at the acclaimed authors of a particular era and see how many have dropped out of consciousness. It even happens in countries such as France where the government supports established reputations, just as it supports ailing large companies.

When I visit Beirut, I often spot in relatives’ homes the remnants of a series of distinctively white-leather-bound “Nobel books”. Some hyperactive salesman once managed to populate private libraries with these beautifully made volumes; many people buy books for decorative purposes and want a simple selection criterion. The criterion this series offered was one book by a Nobel winner in literature every year – a simple way to build the ultimate library. The series was supposed to be updated every year, but I presume the company went out of business in the eighties. I feel a pang every time I look at these volumes: Do you hear much today about Sully Prudhomme (the first recipient), Pearl Buck (an American woman), Romain Rolland, Anatole France (the last two were the most famous French authors of their generations), St. John Perse, Roger Martin du Gard, or Frédéric Mistral?

The Long Tail

I have said that nobody is safe in Extremistan. This has a converse: nobody is threatened with complete extinction either. Our current environment allows the little guy to bide his time in the antechamber of success – as long as there is life, there is hope.

This idea was recently revived by Chris Anderson, one of a very few who get the point that the dynamics of fractal concentration has another layer of randomness. He packaged it with his idea of the “long tail”, about which in a moment. Anderson is lucky not to be a professional statistician (people who have had the misfortune of going through conventional statistical training think we live in Mediocristan). He was able to take a fresh look at the dynamics of the world.

True, the Web produces acute concentration. A large number of users visit just a few sites, such as Google, which, at the time of this writing, has total market dominance. At no time in history has a company grown so dominant so quickly – Google can service people from Nicaragua to southwestern Mongolia to the American West Coast, without having to worry about phone operators, shipping, delivery, and manufacturing. This is the ultimate winner-take-all case study.

People forget, though, that before Google, Alta Vista dominated the search-engine market. I am prepared to revise the Google metaphor by replacing it with a new name for future editions of this book.

What Anderson saw is that the Web causes something in addition to concentration. The Web enables the formation of a reservoir of proto-Googles waiting in the background. It also promotes the inverse Google, that is, it allows people with a technical specialty to find a small, stable audience.

Recall the role of the Web in Yevgenia Krasnova’s success. Thanks to the Internet, she was able to bypass conventional publishers. Her publisher with the pink glasses would not even have been in business had it not been for the Web. Let’s assume that Amazon.com does not exist, and that you have written a sophisticated book. Odds are that a very small bookstore that carries only 5,000 volumes will not be interested in letting your “beautifully crafted prose” occupy premium shelf space. And the megabookstore, such as the average American Barnes & Noble, might stock 130,000 volumes, which is still not sufficient to accommodate marginal titles. So your work is stillborn.

Not so with Web vendors. A Web bookstore can carry a near-infinite number of books since it need not have them physically in inventory. Actually, nobody needs to have them physically in inventory since they can remain in digital form until they are needed in print, an emerging business called print-on-demand.

So as the author of this little book, you can sit there, bide your time, be available in search engines, and perhaps benefit from an occasional epidemic. In fact, the quality of readership has improved markedly over the past few years thanks to the availability of these more sophisticated books. This is a fertile environment for diversity.[47]

Plenty of people have called me to discuss the idea of the long tail, which seems to be the exact opposite of the concentration implied by scalability. The long tail implies that the small guys, collectively, should control a large segment of culture and commerce, thanks to the niches and subspecialties that can now survive thanks to the Internet. But, strangely, it can also imply a large measure of inequality: a large base of small guys and a very small number of supergiants, together representing a share of the world’s culture – with some of the small guys, on occasion, rising to knock out the winners. (This is the “double tail”: a large tail of the small guys, a small tail of the big guys.)

The role of the long tail is fundamental in changing the dynamics of success, destabilizing the well-seated winner, and bringing about another winner. In a snapshot this will always be Extremistan, always ruled by the concentration of type-2 randomness; but it will be an ever-changing Extremistan.

The long tail’s contribution is not yet numerical; it is still confined to the Web and its small-scale online commerce. But consider how the long tail could affect the future of culture, information, and political life. It could free us from the dominant political parties, from the academic system, from the clusters of the press – anything that is currently in the hands of ossified, conceited, and self-serving authority. The long tail will help foster cognitive diversity. One highlight of the year 2006 was to find in my mailbox a draft manuscript of a book called Cognitive Diversity: How Our Individual Differences Produce Collective Benefits, by Scott Page. Page examines the effects of cognitive diversity on problem solving and shows how variability in views and methods acts like an engine for tinkering. It works like evolution. By subverting the big structures we also get rid of the Platonified one way of doing things – in the end, the bottom-up theory-free empiricist should prevail.

In sum, the long tail is a by-product of Extremistan that makes it somewhat less unfair: the world is made no less unfair for the little guy, but it now becomes extremely unfair for the big man. Nobody is truly established. The little guy is very subversive.

Naïve Globalization

We are gliding into disorder, but not necessarily bad disorder. This implies that we will see more periods of calm and stability, with most problems concentrated into a small number of Black Swans.

Consider the nature of past wars. The twentieth century was not the deadliest (in percentage of the total population), but it brought something new: the beginning of the Extremistan warfare – a small probability of a conflict degenerating into total decimation of the human race, a conflict from which nobody is safe anywhere.

A similar effect is taking place in economic life. I spoke about globalization in Chapter 3; it is here, but it is not all for the good: it creates interlocking fragility, while reducing volatility and giving the appearance of stability. In other words it creates devastating Black Swans. We have never lived before under the threat of a global collapse. Financial institutions have been merging into a smaller number of very large banks. Almost all banks are now interrelated. So the financial ecology is swelling into gigantic, incestuous, bureaucratic banks (often Gaussianized in their risk measurement) – when one falls, they all fall.[48] The increased concentration among banks seems to have the effect of making financial crisis less likely, but when they happen they are more global in scale and hit us very hard. We have moved from a diversified ecology of small banks, with varied lending policies, to a more homogeneous framework of firms that all resemble one another. True, we now have fewer failures, but when they occur … I shiver at the thought. I rephrase here: we will have fewer but more severe crises. The rarer the event, the less we know about its odds. It mean that we know less and less about the possibility of a crisis.

And we have some idea how such a crisis would happen. A network is an assemblage of elements called nodes that are somehow connected to one another by a link; the world’s airports constitute a network, as does the World Wide Web, as do social connections and electricity grids. There is a branch of research called “network theory” that studies the organization of such networks and the links between their nodes, with such researchers as Duncan Watts, Steven Strogatz, Albert-Laszlo Barabasi, and many more. They all understand Extremistan mathematics and the inadequacy of the Gaussian bell curve. They have uncovered the following property of networks: there is a concentration among a few nodes that serve as central connections. Networks have a natural tendency to organize themselves around an extremely concentrated architecture: a few nodes are extremely connected; others barely so. The distribution of these connections has a scalable structure of the kind we will discuss in Chapters 15 and 16. Concentration of this kind is not limited to the Internet; it appears in social life (a small number of people are connected to others), in electricity grids, in communications networks. This seems to make networks more robust: random insults to most parts of the network will not be consequential since they are likely to hit a poorly connected spot. But it also makes networks more vulnerable to Black Swans. Just consider what would happen if there is a problem with a major node. The electricity blackout experienced in the northeastern United States during August 2003, with its consequential mayhem, is a perfect example of what could take place if one of the big banks went under today.

But banks are in a far worse situation than the Internet. The financial industry has no significant long tail! We would be far better off if there were a different ecology, in which financial institutions went bust on occasion and were rapidly replaced by new ones, thus mirroring the diversity of Internet businesses and the resilience of the Internet economy. Or if there were a long tail of government officials and civil servants coming to reinvigorate bureaucracies.

REVERSALS AWAY FROM EXTREMISTAN

There is, inevitably, a mounting tension between our society, full of concentration, and our classical idea of aurea mediocritas, the golden mean, so it is conceivable that efforts may be made to reverse such concentration. We live in a society of one person, one vote, where progressive taxes have been enacted precisely to weaken the winners. Indeed, the rules of society can be easily rewritten by those at the bottom of the pyramid to prevent concentration from hurting them. But it does not require voting to do so – religion could soften the problem. Consider that before Christianity, in many societies the powerful had many wives, thus preventing those at the bottom from accessing wombs, a condition that is not too different from the reproductive exclusivity of alpha males in many species. But Christianity reversed this, thanks to the one man-one woman rule. Later, Islam came to limit the number of wives to four. Judaism, which had been polygenic, became monogamous in the Middle Ages. One can say that such a strategy has been successful – the institution of tightly monogamous marriage (with no official concubine, as in the Greco-Roman days), even when practiced the “French way”, provides social stability since there is no pool of angry, sexually deprived men at the bottom fomenting a revolution just so they can have the chance to mate.

But I find the emphasis on economic inequality, at the expense of other types of inequality, extremely bothersome. Fairness is not exclusively an economic matter; it becomes less and less so when we are satisfying our basic material needs. It is pecking order that matters! The superstars will always be there. The Soviets may have flattened the economic structure, but they encouraged their own brand of übermensch. What is poorly understood, or denied (owing to its unsettling implications), is the absence of a role for the average in intellectual production. The disproportionate share of the very few in intellectual influence is even more unsettling than the unequal distribution of wealth – unsettling because, unlike the income gap, no social policy can eliminate it. Communism could conceal or compress income discrepancies, but it could not eliminate the superstar system in intellectual life.

It has even been shown, by Michael Marmot of the Whitehall Studies, that those at the top of the pecking order live longer, even when adjusting for disease. Marmot’s impressive project shows how social rank alone can affect longevity. It was calculated that actors who win an Oscar tend to live on average about five years longer than their peers who don’t. People live longer in societies that have flatter social gradients. Winners kill their peers as those in a steep social gradient live shorter lives, regardless of their economic condition.

I do not know how to remedy this (except through religious beliefs). Is insurance against your peers’ demoralizing success possible? Should the Nobel Prize be banned? Granted the Nobel medal in economics has not been good for society or knowledge, but even those rewarded for real contributions in medicine and physics too rapidly displace others from our consciousness, and steal longevity away from them. Extremistan is here to stay, so we have to live with it, and find the tricks that make it more palatable.

Chapter Fifteen: THE BELL CURVE, THAT GREAT INTELLECTUAL FRAUD[49]

Not worth a pastis – Quételet’s error – The average man is a monster – Let’s deify it – Yes or no – Not so literary an experiment

Forget everything you heard in college statistics or probability theory. If you never took such a class, even better. Let us start from the very beginning.

THE GAUSSIAN AND THE MANDELBROTIAN

I was transiting through the Frankfurt airport in December 2001, on my way from Oslo to Zurich.

I had time to kill at the airport and it was a great opportunity for me to buy dark European chocolate, especially since I have managed to successfully convince myself that airport calories don’t count. The cashier handed me, among other things, a ten deutschmark bill, an (illegal) scan of which can be seen on the next page. The deutschmark banknotes were going to be put out of circulation in a matter of days, since Europe was switching to the euro. I kept it as a valedictory. Before the arrival of the euro, Europe had plenty of national currencies, which was good for printers, money changers, and of course currency traders like this (more or less) humble author. As I was eating my dark European chocolate and wistfully looking at the bill, I almost choked, I suddenly noticed, for the first time, that there was something curious about it. The bill bore the portrait of Carl Friedrich Gauss and a picture of his Gaussian bell curve.


The last ten deutschmark bill, representing Gauss and, to his right, the bell curve of Mediocristan.

The striking irony here is that the last possible object that can be linked to the German currency is precisely such a curve: the reichsmark (as the currency was previously called) went from four per dollar to four trillion per dollar in the space of a few years during the 1920s, an outcome that tells you that the bell curve is meaningless as a description of the randomness in currency fluctuations. All you need to reject the bell curve is for such a movement to occur once, and only once – just consider the consequences. Yet there was the bell curve, and next to it Herr Professor Doktor Gauss, unprepossessing, a little stern, certainly not someone I’d want to spend time with lounging on a terrace, drinking pastis, and holding a conversation without a subject.

Shockingly, the bell curve is used as a risk-measurement tool by those regulators and central bankers who wear dark suits and talk in a boring way about currencies.

The Increase in the Decrease

The main point of the Gaussian, as I’ve said, is that most observations hover around the mediocre, the average; the odds of a deviation decline faster and faster (exponentially) as you move away from the average. If you must have only one single piece of information, this is the one: the dramatic increase in the speed of decline in the odds as you move away from the center, or the average. Look at the list below for an illustration of this. I am taking an example of a Gaussian quantity, such as height, and simplifying it a bit to make it more illustrative. Assume that the average height (men and women) is 1.67 meters, or 5 feet 7 inches. Consider what I call a unit of deviation here as 10 centimeters. Let us look at increments above 1.67 meters and consider the odds of someone being that tall.[50]

10 centimeters taller than the average (i.e., taller than 1.77 m, or 5 feet 10): 1 in 6.3

20 centimeters taller than the average (i.e., taller than 1.87 m, or 6 feet 2): 1 in 44

30 centimeters taller than the average (i.e., taller than 1.97 m, or 6 feet 6): 1 in 740

40 centimeters taller than the average (i.e., taller than 2.07 m, or 6 feet 9): 1 in 32,000

50 centimeters taller than the average (i.e., taller than 2.17 m, or 7 feet 1): 1 in 3,500,000

60 centimeters taller than the average (i.e., taller than 2.27 m, or 7 feet 5): 1 in 1,000,000,000

70 centimeters taller than the average (i.e., taller than 2.37 m, or 7 feet 9): 1 in 780,000,000,000

80 centimeters taller than the average (i.e., taller than 2.47 m, or 8 feet 1): 1 in 1,600,000,000,000,000

90 centimeters taller than the average (i.e., taller than 2.57 m, or 8 feet 5): 1 in 8,900,000,000,000,000,000

100 centimeters taller than the average (i.e., taller than 2.67 m, or 8 feet 9): 1 in 130,000,000,000,000,000,000,000

… and,

110 centimeters taller than the average (i.e., taller than 2.77 m, or 9 feet 1): 1 in 36,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000.

Note that soon after, I believe, 22 deviations, or 220 centimeters taller than the average, the odds reach a googol, which is 1 with 100 zeroes behind it.

The point of this list is to illustrate the acceleration. Look at the difference in odds between 60 and 70 centimeters taller than average: for a mere increase of four inches, we go from one in 1 billion people to one in 780 billion! As for the jump between 70 and 80 centimeters: an additional 4 inches above the average, we go from one in 780 billion to one in 1.6 million billion![51]

This precipitous decline in the odds of encountering something is what allows you to ignore outliers. Only one curve can deliver this decline, and it is the bell curve (and its nonscalable siblings).

The Mandelbrotian

By comparison, look at the odds of being rich in Europe. Assume that wealth there is scalable, i.e., Mandelbrotian. (This is not an accurate description of wealth in Europe; it is simplified to emphasize the logic of scalable distribution.)[52]

Scalable Wealth Distribution

People with a net worth higher than €1 million: 1 in 62.5

Higher than €2 million: 1 in 250

Higher than €4 million: 1 in 1,000

Higher than €8 million: 1 in 4,000

Higher than €16 million: 1 in 16,000

Higher than €32 million: 1 in 64,000

Higher than €320 million: 1 in 6,400,000

The speed of the decrease here remains constant (or does not decline)! When you double the amount of money you cut the incidence by a factor of four, no matter the level, whether you are at €8 million or €16 million. This, in a nutshell, illustrates the difference between Mediocristan and Extremistan.

Recall the comparison between the scalable and the nonscalable in Chapter 3. Scalability means that there is no headwind to slow you down.

Of course, Mandelbrotian Extremistan can take many shapes. Consider wealth in an extremely concentrated version of Extremistan; there, if you double the wealth, you halve the incidence. The result is quantitatively different from the above example, but it obeys the same logic.

Fractal Wealth Distribution with Large Inequalities

People with a net worth higher than €1 million: 1 in 63

Higher than €2 million: 1 in 125

Higher than €4 million: 1 in 250

Higher than €8 million: 1 in 500

Higher than €16 million: 1 in 1,000

Higher than €32 million: 1 in 2,000

Higher than €320 million: 1 in 20,000

Higher than €640 million: 1 in 40,000

If wealth were Gaussian, we would observe the following divergence away from €1 million.

Wealth Distribution Assuming a Gaussian Law

People with a net worth higher than €1 million: 1 in 63

Higher than €2 million: 1 in 127,000

Higher than €3 million: 1 in 14,000,000,000

Higher than €4 million: 1 in 886,000,000,000,000,000

Higher than €8 million: 1 in 16,000,000,000,000,000,000,000,000,000,000,000

Higher than €16 million: 1 in … none of my computers is capable of handling the computation.

What I want to show with these lists is the qualitative difference in the paradigms. As I have said, the second paradigm is scalable; it has no headwind. Note that another term for the scalable is power laws.

Just knowing that we are in a power-law environment does not tell us much. Why? Because we have to measure the coefficients in real life, which is much harder than with a Gaussian framework. Only the Gaussian yields its properties rather rapidly. The method I propose is a general way of viewing the world rather than a precise solution.

What to Remember

Remember this: the Gaussian-bell curve variations face a headwind that makes probabilities drop at a faster and faster rate as you move away from the mean, while “scalables”, or Mandelbrotian variations, do not have such a restriction. That’s pretty much most of what you need to know.[53]

Inequality

Let us look more closely at the nature of inequality. In the Gaussian framework, inequality decreases as the deviations get larger – caused by the increase in the rate of decrease. Not so with the scalable: inequality stays the same throughout. The inequality among the superrich is the same as the inequality among the simply rich – it does not slow down.[54]

Consider this effect. Take a random sample of any two people from the U.S. population who jointly earn $1 million per annum. What is the most likely breakdown of their respective incomes? In Mediocristan, the most likely combination is half a million each. In Extremistan, it would be $50,000 and $950,000.

The situation is even more lopsided with book sales. If I told you that two authors sold a total of a million copies of their books, the most likely combination is 993,000 copies sold for one and 7,000 for the other. This is far more likely than that the books each sold 500,000 copies. For any large total, the breakdown will be more and more asymmetric.

Why is this so? The height problem provides a comparison. If I told you that the total height of two people is fourteen feet, you would identify the most likely breakdown as seven feet each, not two feet and twelve feet; not even eight feet and six feet! Persons taller than eight feet are so rare that such a combination would be impossible.

Extremistan and the 80/20 Rule

Have you ever heard of the 80/20 rule? It is the common signature of a power law – actually it is how it all started, when Vilfredo Pareto made the observation that 80 percent of the land in Italy was owned by 20 percent of the people. Some use the rule to imply that 80 percent of the work is done by 20 percent of the people. Or that 80 percent worth of effort contributes to only 20 percent of results, and vice versa.

As far as axioms go, this one wasn’t phrased to impress you the most: it could easily be called the 50/01 rule, that is, 50 percent of the work comes from 1 percent of the workers. This formulation makes the world look even more unfair, yet the two formulae are exactly the same. How? Well, if there is inequality, then those who constitute the 20 percent in the 80/20 rule also contribute unequally – only a few of them deliver the lion’s share of the results. This trickles down to about one in a hundred contributing a little more than half the total.

The 80/20 rule is only metaphorical; it is not a rule, even less a rigid law. In the U.S. book business, the proportions are more like 97/20 (i.e., 97 percent of book sales are made by 20 percent of the authors); it’s even worse if you focus on literary nonfiction (twenty books of close to eight thousand represent half the sales).

Note here that it is not all uncertainty. In some situations you may have a concentration, of the 80/20 type, with very predictable and tractable properties, which enables clear decision making, because you can identify beforehand where the meaningful 20 percent are. These situations are very easy to control. For instance, Malcolm Gladwell wrote in an article in The New Yorker that most abuse of prisoners is attributable to a very small number of vicious guards. Filter those guards out and your rate of prisoner abuse drops dramatically. (In publishing, on the other hand, you do not know beforehand which book will bring home the bacon. The same with wars, as you do not know beforehand which conflict will kill a portion of the planet’s residents.)

Grass and Trees

I’ll summarize here and repeat the arguments previously made throughout the book. Measures of uncertainty that are based on the bell curve simply disregard the possibility, and the impact, of sharp jumps or discontinuities and are, therefore, inapplicable in Extremistan. Using them is like focusing on the grass and missing out on the (gigantic) trees. Although unpredictable large deviations are rare, they cannot be dismissed as outliers because, cumulatively, their impact is so dramatic.

The traditional Gaussian way of looking at the world begins by focusing on the ordinary, and then deals with exceptions or so-called outliers as ancillaries. But there is a second way, which takes the exceptional as a starting point and treats the ordinary as subordinate.

I have emphasized that there are two varieties of randomness, qualitatively different, like air and water. One does not care about extremes; the other is severely impacted by them. One does not generate Black Swans; the other does. We cannot use the same techniques to discuss a gas as we would use with a liquid. And if we could, we wouldn’t call the approach “an approximation”. A gas does not “approximate” a liquid.

We can make good use of the Gaussian approach in variables for which there is a rational reason for the largest not to be too far away from the average. If there is gravity pulling numbers down, or if there are physical limitations preventing very large observations, we end up in Mediocristan. If there are strong forces of equilibrium bringing things back rather rapidly after conditions diverge from equilibrium, then again you can use the Gaussian approach. Otherwise, fuhgedaboudit. This is why much of economics is based on the notion of equilibrium: among other benefits, it allows you to treat economic phenomena as Gaussian.

Note that I am not telling you that the Mediocristan type of randomness does not allow for some extremes. But it tells you that they are so rare that they do not play a significant role in the total. The effect of such extremes is pitifully small and decreases as your population gets larger.

To be a little bit more technical here, if you have an assortment of giants and dwarfs, that is, observations several orders of magnitude apart, you could still be in Mediocristan. How? Assume you have a sample of one thousand people, with a large spectrum running from the dwarf to the giant. You are likely to see many giants in your sample, not a rare occasional one. Your average will not be impacted by the occasional additional giant because some of these giants are expected to be part of your sample, and your average is likely to be high. In other words, the largest observation cannot be too far away from the average. The average will always contain both kinds, giants and dwarves, so that neither should be too rare – unless you get a megagiant or a microdwarf on very rare occasion. This would be Mediocristan with a large unit of deviation.

Note once again the following principle: the rarer the event, the higher the error in our estimation of its probability – even when using the Gaussian.

Let me show you how the Gaussian bell curve sucks randomness out of life – which is why it is popular. We like it because it allows certainties! How? Through averaging, as I will discuss next.

How Coffee Drinking Can Be Safe

Recall from the Mediocristan discussion in Chapter 3 that no single observation will impact your total. This property will be more and more significant as your population increases in size. The averages will become more and more stable, to the point where all samples will look alike.

I’ve had plenty of cups of coffee in my life (it’s my principal addiction). I have never seen a cup jump two feet from my desk, nor has coffee spilled spontaneously on this manuscript without intervention (even in Russia). Indeed, it will take more than a mild coffee addiction to witness such an event; it would require more lifetimes than is perhaps conceivable – the odds are so small, one in so many zeroes, that it would be impossible for me to write them down in my free time.

Yet physical reality makes it possible for my coffee cup to jump – very unlikely, but possible. Particles jump around all the time. How come the coffee cup, itself composed of jumping particles, does not? The reason is, simply, that for the cup to jump would require that all of the particles jump in the same direction, and do so in lockstep several times in a row (with a compensating move of the table in the opposite direction). All several trillion particles in my coffee cup are not going to jump in the same direction; this is not going to happen in the lifetime of this universe. So I can safely put the coffee cup on the edge of my writing table and worry about more serious sources of uncertainty.


FIGURE 7: How the Law of Large Numbers Works

In Mediocristan, as your sample size increases, the observed average will present itself with less and less dispersion – as you can see, the distribution will be narrower and narrower. This, in a nutshell, is how everything in statistical theory works (or is supposed to work). Uncertainty in Mediocristan vanishes under averaging. This illustrates the hackneyed “law of large numbers”.


The safety of my coffee cup illustrates how the randomness of the Gaussian is tamable by averaging. If my cup were one large particle, or acted as one, then its jumping would be a problem. But my cup is the sum of trillions of very small particles.

Casino operators understand this well, which is why they never (if they do things right) lose money. They simply do not let one gambler make a massive bet, instead preferring to have plenty of gamblers make series of bets of limited size. Gamblers may bet a total of $20 million, but you needn’t worry about the casino’s health: the bets run, say, $20 on average; the casino caps the bets at a maximum that will allow the casino owners to sleep at night. So the variations in the casino’s returns are going to be ridiculously small, no matter the total gambling activity. You will not see anyone leaving the casino with $1 billion – in the lifetime of this universe.

The above is an application of the supreme law of Mediocristan: when you have plenty of gamblers, no single gambler will impact the total more than minutely.

The consequence of this is that variations around the average of the Gaussian, also called “errors”, are not truly worrisome. They are small and they wash out. They are domesticated fluctuations around the mean.

Love of Certainties

If you ever took a (dull) statistics class in college, did not understand much of what the professor was excited about, and wondered what “standard deviation” meant, there is nothing to worry about. The notion of standard deviation is meaningless outside of Mediocristan. Clearly it would have been more beneficial, and certainly more entertaining, to have taken classes in the neurobiology of aesthetics or postcolonial African dance, and this is easy to see empirically.

Standard deviations do not exist outside the Gaussian, or if they do exist they do not matter and do not explain much. But it gets worse. The Gaussian family (which includes various friends and relatives, such as the Poisson law) are the only class of distributions that the standard deviation (and the average) is sufficient to describe. You need nothing else. The bell curve satisfies the reductionism of the deluded.

There are other notions that have little or no significance outside of the Gaussian: correlation and, worse, regression. Yet they are deeply ingrained in our methods; it is hard to have a business conversation without hearing the word correlation.

To see how meaningless correlation can be outside of Mediocristan, take a historical series involving two variables that are patently from Extremistan, such as the bond and the stock markets, or two securities prices, or two variables like, say, changes in book sales of children’s books in the United States, and fertilizer production in China; or real-estate prices in New York City and returns of the Mongolian stock market. Measure correlation between the pairs of variables in different subperiods, say, for 1994, 1995, 1996, etc. The correlation measure will be likely to exhibit severe instability; it will depend on the period for which it was computed. Yet people talk about correlation as if it were something real, making it tangible, investing it with a physical property, reifying it.

The same illusion of concreteness affects what we call “standard” deviations. Take any series of historical prices or values. Break it up into subsegments and measure its “standard” deviation. Surprised? Every sample will yield a different “standard” deviation. Then why do people talk about standard deviations? Go figure.

Note here that, as with the narrative fallacy, when you look at past data and compute one single correlation or standard deviation, you do not notice such instability.

How to Cause Catastrophes

If you use the term statistically significant, beware of the illusions of certainties. Odds are that someone has looked at his observation errors and assumed that they were Gaussian, which necessitates a Gaussian context, namely, Mediocristan, for it to be acceptable.

To show how endemic the problem of misusing the Gaussian is, and how dangerous it can be, consider a (dull) book called Catastrophe by Judge Richard Posner, a prolific writer. Posner bemoans civil servants’ misunderstandings of randomness and recommends, among other things, that government policy makers learn statistics … from economists. Judge Posner appears to be trying to foment catastrophes. Yet, in spite of being one of those people who should spend more time reading and less time writing, he can be an insightful, deep, and original thinker; like many people, he just isn’t aware of the distinction between Mediocristan and Extremistan, and he believes that statistics is a “science”, never a fraud. If you run into him, please make him aware of these things.

QUÉTELET’S AVERAGE MONSTER

This monstrosity called the Gaussian bell curve is not Gauss’s doing. Although he worked on it, he was a mathematician dealing with a theoretical point, not making claims about the structure of reality like statistical-minded scientists. G.H. Hardy wrote in “A Mathematician’s Apology”:

The “real” mathematics of the “real” mathematicians, the mathematics of Fermát and Euler and Gauss and Abel and Riemann, is almost wholly “useless” (and this is as true of “applied” as of “pure” mathematics).

As I mentioned earlier, the bell curve was mainly the concoction of a gambler, Abraham de Moivre (1667-1754), a French Calvinist refugee who spent much of his life in London, though speaking heavily accented English. But it is Quételet, not Gauss, who counts as one of the most destructive fellows in the history of thought, as we will see next.

Adolphe Quételet (1796-1874) came up with the notion of a physically average human, l’homme moyen. There was nothing moyen about Quételet, “a man of great creative passions, a creative man full of energy”. He wrote poetry and even coauthored an opera. The basic problem with Quételet was that he was a mathematician, not an empirical scientist, but he did not know it. He found harmony in the bell curve.

The problem exists at two levels. Primo, Quételet had a normative idea, to make the world fit his average, in the sense that the average, to him, was the “normal”. It would be wonderful to be able to ignore the contribution of the unusual, the “nonnormal”, the Black Swan, to the total. But let us leave that dream for Utopia.

Secondo, there was a serious associated empirical problem. Quételet saw bell curves everywhere. He was blinded by bell curves and, I have learned, again, once you get a bell curve in your head it is hard to get it out. Later, Frank Ysidro Edgeworth would refer to Quételesmus as the grave mistake of seeing bell curves everywhere.

Golden Mediocrity

Quételet provided a much needed product for the ideological appetites of his day. As he lived between 1796 and 1874, so consider the roster of his contemporaries: Saint-Simon (1760-1825), Pierre-Joseph Proudhon (1809-1865), and Karl Marx (1818-1883), each the source of a different version of socialism. Everyone in this post-Enlightenment moment was longing for the aurea mediocritas, the golden mean: in wealth, height, weight, and so on. This longing contains some element of wishful thinking mixed with a great deal of harmony and … Platonicity.

I always remember my father’s injunction that in medio stat virtus, “virtue lies in moderation”. Well, for a long time that was the ideal; mediocrity, in that sense, was even deemed golden. All-embracing mediocrity.

But Quételet took the idea to a different level. Collecting statistics, he started creating standards of “means”. Chest size, height, the weight of babies at birth, very little escaped his standards. Deviations from the norm, he found, became exponentially more rare as the magnitude of the deviation increased. Then, having conceived of this idea of the physical characteristics of l’homme moyen, Monsieur Quételet switched to social matters. L’homme moyen had his habits, his consumption, his methods.

Through his construct of l’homme moyen physique and l’homme moyen moral, the physically and morally average man, Quételet created a range of deviance from the average that positions all people either to the left or right of center and, truly, punishes those who find themselves occupying the extreme left or right of the statistical bell curve. They became abnormal. How this inspired Marx, who cites Quételet regarding this concept of an average or normal man, is obvious: “Societal deviations in terms of the distribution of wealth for example, must be minimized”, he wrote in Das Kapital.

One has to give some credit to the scientific establishment of Quételet’s day. They did not buy his arguments at once. The philosopher/mathematician/economist Augustin Cournot, for starters, did not believe that one could establish a standard human on purely quantitative grounds. Such a standard would be dependent on the attribute under consideration. A measurement in one province may differ from that in another province. Which one should be the standard? L’homme moyen would be a monster, said Cournot. I will explain his point as follows.

Assuming there is something desirable in being an average man, he must have an unspecified specialty in which he would be more gifted than other people – he cannot be average in everything. A pianist would be better on average at playing the piano, but worse than the norm at, say, horseback riding. A draftsman would have better drafting skills, and so on. The notion of a man deemed average is different from that of a man who is average in everything he does. In fact, an exactly average human would have to be half male and half female. Quételet completely missed that point.

God’s Error

A much more worrisome aspect of the discussion is that in Quételet’s day, the name of the Gaussian distribution was la loi des erreurs, the law of errors, since one of its earliest applications was the distribution of errors in astronomic measurements. Are you as worried as I am? Divergence from the mean (here the median as well) was treated precisely as an error! No wonder Marx fell for Quételet’s ideas.

This concept took off very quickly. The ought was confused with the is, and this with the imprimatur of science. The notion of the average man is steeped in the culture attending the birth of the European middle class, the nascent post-Napoleonic shopkeeper’s culture, chary of excessive wealth and intellectual brilliance. In fact, the dream of a society with compressed outcomes is assumed to correspond to the aspirations of a rational human being facing a genetic lottery. If you had to pick a society to be born into for your next life, but could not know which outcome awaited you, it is assumed you would probably take no gamble; you would like to belong to a society without divergent outcomes.

One entertaining effect of the glorification of mediocrity was the creation of a political party in France called Poujadism, composed initially of a grocery-store movement. It was the warm huddling together of the semi-favored hoping to see the rest of the universe compress itself into their rank – a case of non-proletarian revolution. It had a grocery-store-owner mentality, down to the employment of the mathematical tools. Did Gauss provide the mathematics for the shopkeepers?

Poincaré to the Rescue

Poincaré himself was quite suspicious of the Gaussian. I suspect that he felt queasy when it and similar approaches to modeling uncertainty were presented to him. Just consider that the Gaussian was initially meant to measure astronomic errors, and that Poincaré’s ideas of modeling celestial mechanics were fraught with a sense of deeper uncertainty.

Poincaré wrote that one of his friends, an unnamed “eminent physicist”, complained to him that physicists tended to use the Gaussian curve because they thought mathematicians believed it a mathematical necessity; mathematicians used it because they believed that physicists found it to be an empirical fact.

Eliminating Unfair Influence

Let me state here that, except for the grocery-store mentality, I truly believe in the value of middleness and mediocrity – what humanist does not want to minimize the discrepancy between humans? Nothing is more repugnant than the inconsiderate ideal of the Ubermensch! My true problem is epistemological. Reality is not Mediocristan, so we should learn to live with it.

“The Greeks Would Have Deified It”

The list of people walking around with the bell curve stuck in their heads, thanks to its Platonic purity, is incredibly long.

Sir Francis Galton, Charles Darwin’s first cousin and Erasmus Darwin’s grandson, was perhaps, along with his cousin, one of the last independent gentlemen scientists – a category that also included Lord Cavendish, Lord Kelvin, Ludwig Wittgenstein (in his own way), and to some extent, our überphilosopher Bertrand Russell. Although John Maynard Keynes was not quite in that category, his thinking epitomizes it. Galton lived in the Victorian era when heirs and persons of leisure could, among other choices, such as horseback riding or hunting, become thinkers, scientists, or (for those less gifted) politicians. There is much to be wistful about in that era: the authenticity of someone doing science for science’s sake, without direct career motivations.

Unfortunately, doing science for the love of knowledge does not necessarily mean you will head in the right direction. Upon encountering and absorbing the “normal” distribution, Galton fell in love with it. He was said to have exclaimed that if the Greeks had known about it, they would have deified it. His enthusiasm may have contributed to the prevalence of the use of the Gaussian.

Galton was blessed with no mathematical baggage, but he had a rare obsession with measurement. He did not know about the law of large numbers, but rediscovered it from the data itself. He built the quincunx, a pinball machine that shows the development of the bell curve – on which, more in a few paragraphs. True, Galton applied the bell curve to areas like genetics and heredity, in which its use was justified. But his enthusiasm helped thrust nascent statistical methods into social issues.

“Yes/No” Only Please

Let me discuss here the extent of the damage. If you’re dealing with qualitative inference, such as in psychology or medicine, looking for yes/no answers to which magnitudes don’t apply, then you can assume you’re in Mediocristan without serious problems. The impact of the improbable cannot be too large. You have cancer or you don’t, you are pregnant or you are not, et cetera. Degrees of deadness or pregnancy are not relevant (unless you are dealing with epidemics). But if you are dealing with aggregates, where magnitudes do matter, such as income, your wealth, return on a portfolio, or book sales, then you will have a problem and get the wrong distribution if you use the Gaussian, as it does not belong there. One single number can disrupt all your averages; one single loss can eradicate a century of profits. You can no longer say “this is an exception”. The statement “Well, I can lose money” is not informational unless you can attach a quantity to that loss. You can lose all your net worth or you can lose a fraction of your daily income; there is a difference.

This explains why empirical psychology and its insights on human nature, which I presented in the earlier parts of this book, are robust to the mistake of using the bell curve; they are also lucky, since most of their variables allow for the application of conventional Gaussian statistics. When measuring how many people in a sample have a bias, or make a mistake, these studies generally elicit a yes/no type of result. No single observation, by itself, can disrupt their overall findings.

I will next proceed to a sui generis presentation of the bell-curve idea from the ground up.

A (LITERARY) THOUGHT EXPERIMENT ON WHERE THE BELL CURVE COMES FROM

Consider a pinball machine like the one shown in Figure 8. Launch 32 balls, assuming a well-balanced board so that the ball has equal odds of falling right or left at any juncture when hitting a pin. Your expected outcome is that many balls will land in the center columns and that the number of balls will decrease as you move to the columns away from the center.

Next, consider a gedanken, a thought experiment. A man flips a coin and after each toss he takes a step to the left or a step to the right, depending on whether the coin came up heads or tails. This is called the random walk, but it does not necessarily concern itself with walking. You could identically say that instead of taking a step to the left or to the right, you would win or lose $1 at every turn, and you will keep track of the cumulative amount that you have in your pocket.


Assume that I set you up in a (legal) wager where the odds are neither in your favor nor against you. Flip a coin. Heads, you make $1, tails, you lose $1.

At the first flip, you will either win or lose.

At the second flip, the number of possible outcomes doubles. Case one: win, win. Case two: win, lose. Case three: lose, win. Case four: lose, lose. Each of these cases has equivalent odds, the combination of a single win and a single loss has an incidence twice as high because cases two and three, win-lose and lose-win, amount to the same outcome. And that is the key for the Gaussian. So much in the middle washes out – and we will see that there is a lot in the middle. So, if you are playing for $1 a round, after two rounds you have a 25 percent chance of making or losing $2, but a 50 percent chance of breaking even.

FIGURE 8: THE QUINCUNX (SIMPLIFIED) – A PINBALL MACHINE

Drop balls that at every pin, randomly fall right or left. Above is the most probable scenario, which greatly resembles the bell curve (a.k.a. Gaussian disribution). Courtesy of Alexander Taleb.


Let us do another round. The third flip again doubles the number of cases, so we face eight possible outcomes. Case 1 (it was win, win in the second flip) branches out into win, win, win and win, win, lose. We add a win or lose to the end of each of the previous results. Case 2 branches out into win, lose, win and win, lose, lose. Case 3 branches out into lose, win, win and lose, win, lose. Case 4 branches out into lose, lose, win and lose, lose, lose.

We now have eight cases, all equally likely. Note that again you can group the middling outcomes where a win cancels out a loss. (In Galton’s quincunx, situations where the ball falls left and then falls right, or vice versa, dominate so you end up with plenty in the middle.) The net, or cumulative, is the following: 1) three wins; 2) two wins, one loss, net one win; 3) two wins, one loss, net one win; 4) one win, two losses, net one loss; 5) two wins, one loss, net one win; 6) two losses, one win, net one loss; 7) two losses, one win, net one loss; and, finally, 8) three losses.

Out of the eight cases, the case of three wins occurs once. The case of three losses occurs once. The case of one net loss (one win, two losses) occurs three times. The case of one net win (one loss, two wins) occurs three times.

Play one more round, the fourth. There will be sixteen equally likely outcomes. You will have one case of four wins, one case of four losses, four cases of two wins, four cases of two losses, and six break-even cases.

The quincunx (its name is derived from the Latin for five) in the pin-ball example shows the fifth round, with sixty-four possibilities, easy to track. Such was the concept behind the quincunx used by Francis Galton. Galton was both insufficiently lazy and a bit too innocent of mathematics; instead of building the contraption, he could have worked with simpler algebra, or perhaps undertaken a thought experiment like this one.

Let’s keep playing. Continue until you have forty flips. You can perform them in minutes, but we will need a calculator to work out the number of outcomes, which are taxing to our simple thought method. You will have about 1,099,511,627,776 possible combinations – more than one thousand billion. Don’t bother doing the calculation manually, it is two multiplied by itself forty times, since each branch doubles at every juncture. (Recall that we added a win and a lose at the end of the alternatives of the third round to go to the fourth round, thus doubling the number of alternatives.) Of these combinations, only one will be up forty, and only one will be down forty. The rest will hover around the middle, here zero.

We can already see that in this type of randomness extremes are exceedingly rare. One in 1,099,511,627,776 is up forty out of forty tosses. If you perform the exercise of forty flips once per hour, the odds of getting 40 ups in a row are so small that it would take quite a bit of forty-flip trials to see it. Assuming you take a few breaks to eat, argue with your friends and roommates, have a beer, and sleep, you can expect to wait close to four million lifetimes to get a 40-up outcome (or a 40-down outcome) just once. And consider the following. Assume you play one additional round, for a total of 41; to get 41 straight heads would take eight million lifetimes! Going from 40 to 41 halves the odds. This is a key attribute of the nonscalable framework to analyzing randomness: extreme deviations decrease at an increasing rate. You can expect to toss 50 heads in a row once in four billion lifetimes!


FIGURE 9: NUMBERS OF WINS TOSSED

Result of forty tosses. We see the proto-bell curve emerging.

We are not yet fully in a Gaussian bell curve, but we are getting dangerously close. This is still proto-Gaussian, but you can see the gist. (Actually, you will never encounter a Gaussian in its purity since it is a Platonic form – you just get closer but cannot attain it.) However, as you can see in Figure 9, the familiar bell shape is starting to emerge.

How do we get even closer to the perfect Gaussian bell curve? By refining the flipping process. We can either flip 40 times for $1 a flip or 4,000 times for ten cents a flip, and add up the results. Your expected risk is about the same in both situations – and that is a trick. The equivalence in the two sets of flips has a little nonintuitive hitch. We multiplied the number of bets by 100, but divided the bet size by 10 – don’t look for a reason now, just assume that they are “equivalent”. The overall risk is equivalent, but now we have opened up the possibility of winning or losing 400 times in a row. The odds are about one in 1 with 120 zeroes after it, that is, one in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 times.

Continue the process for a while. We go from 40 tosses for $1 each to 4,000 tosses for 10 cents, to 400,000 tosses for 1 cent, getting close and closer to a Gaussian. Figure 10 shows results spread between –40 and 40, namely eighty plot points. The next one would bring that up to 8,000 points.

FIGURE 10: A MORE ABSTRACT VERSION: PLATO’S CURVE

An infinite number of tosses.

Let’s keep going. We can flip 4,000 times staking a tenth of a penny. How about 400,000 times at 1/1000 of a penny? As a Platonic form, the pure Gaussian curve is principally what happens when he have an infinity of tosses per round, with each bet infinitesimally small. Do not bother trying to visualize the results, or even make sense out of them. We can no longer talk about an “infinitesimal” bet size (since we have an infinity of these, and we are in what mathematicians call a continuous framework). The good news is that there is a substitute.

We have moved from a simple bet to something completely abstract. We have moved from observations into the realm of mathematics. In mathematics things have a purity to them.

Now, something completely abstract is not supposed to exist, so please do not even make an attempt to understand Figure 10. Just be aware of its use. Think of it as a thermometer: you are not supposed to understand what the temperature means in order to talk about it. You just need to know the correspondence between temperature and comfort (or some other empirical consideration). Sixty degrees corresponds to pleasant weather; ten below is not something to look forward to. You don’t necessarily care about the actual speed of the collisions among particles that more technically explains temperature. Degrees are, in a way, a means for your mind to translate some external phenomena into a number. Likewise, the Gaussian bell curve is set so that 68.2 percent of the observations fall between minus one and plus one standard deviations away from the average. I repeat: do not even try to understand whether standard deviation is average deviation – it is not, and a large (too large) number of people using the word standard deviation do not understand this point. Standard deviation is just a number that you scale things to, a matter of mere correspondence if phenomena were Gaussian.

These standard deviations are often nicknamed “sigma”. People also talk about “variance” (same thing: variance is the square of the sigma, i.e., of the standard deviation).

Note the symmetry in the curve. You get the same results whether the sigma is positive or negative. The odds of falling below –4 sigmas are the same as those of exceeding 4 sigmas, here 1 in 32,000 times.

As the reader can see, the main point of the Gaussian bell curve is, as I have been saying, that most observations hover around the mediocre, the mean, while the odds of a deviation decline faster and faster (exponentially) as you move away from the mean. If you need to retain one single piece of information, just remember this dramatic speed of decrease in the odds as you move away from the average. Outliers are increasingly unlikely. You can safely ignore them.

This property also generates the supreme law of Mediocristan: given the paucity of large deviations, their contribution to the total will be vanishingly small.

In the height example earlier in this chapter, I used units of deviations of ten centimeters, showing how the incidence declined as the height increased. These were one sigma deviations; the height table also provides an example of the operation of “scaling to a sigma” by using the sigma as a unit of measurement.

Those Comforting Assumptions

Note the central assumptions we made in the coin-flip game that led to the proto-Gaussian, or mild randomness.

First central assumption: the flips are independent of one another. The coin has no memory. The fact that you got heads or tails on the previous flip does not change the odds of your getting heads or tails on the next one. You do not become a “better” coin flipper over time. If you introduce memory, or skills in flipping, the entire Gaussian business becomes shaky.

Recall our discussions in Chapter 14 on preferential attachment and cumulative advantage. Both theories assert that winning today makes you more likely to win in the future. Therefore, probabilities are dependent on history, and the first central assumption leading to the Gaussian bell curve fails in reality. In games, of course, past winnings are not supposed to translate into an increased probability of future gains – but not so in real life, which is why I worry about teaching probability from games. But when winning leads to more winning, you are far more likely to see forty wins in a row than with a proto-Gaussian.

Second central assumption: no “wild” jump. The step size in the building block of the basic random walk is always known, namely one step. There is no uncertainty as to the size of the step. We did not encounter situations in which the move varied wildly.

Remember that if either of these two central assumptions is not met, your moves (or coin tosses) will not cumulatively lead to the bell curve. Depending on what happens, they can lead to the wild Mandelbrotian-style scale-invariant randomness.

“The Ubiquity of the Gaussian”

One of the problems I face in life is that whenever I tell people that the Gaussian bell curve is not ubiquitous in real life, only in the minds of statisticians, they require me to “prove it” – which is easy to do, as we will see in the next two chapters, yet nobody has managed to prove the opposite. Whenever I suggest a process that is not Gaussian, I am asked to justify my suggestion and to, beyond the phenomena, “give them the theory behind it”. We saw in Chapter 14 the rich-get-richer models that were proposed in order to justify not using a Gaussian. Modelers were forced to spend their time writing theories on possible models that generate the scalable – as if they needed to be apologetic about it. Theory shmeory! I have an epistemological problem with that, with the need to justify the world’s failure to resemble an idealized model that someone blind to reality has managed to promote.

My technique, instead of studying the possible models generating non-bell curve randomness, hence making the same errors of blind theorizing, is to do the opposite: to know the bell curve as intimately as I can and identify where it can and cannot hold. I know where Mediocristan is. To me it is frequently (nay, almost always) the users of the bell curve who do not understand it well, and have to justify it, and not the opposite.

This ubiquity of the Gaussian is not a property of the world, but a problem in our minds, stemming from the way we look at it.


The next chapter will address the scale invariance of nature and address the properties of the fractal. The chapter after that will probe the misuse of the Gaussian in socioeconomic life and “the need to produce theories”. I sometimes get a little emotional because I’ve spent a large part of my life thinking about this problem. Since I started thinking about it, and conducting a variety of thought experiments as I have above, I have not for the life of me been able to find anyone around me in the business and statistical world who was intellectually consistent in that he both accepted the Black Swan and rejected the Gaussian and Gaussian tools. Many people accepted my Black Swan idea but could not take it to its logical conclusion, which is that you cannot use one single measure for randomness called standard deviation (and call it “risk”); you cannot expect a simple answer to characterize uncertainty. To go the extra step requires courage, commitment, an ability to connect the dots, a desire to understand randomness fully. It also means not accepting other people’s wisdom as gospel. Then I started finding physicists who had rejected the Gaussian tools but fell for another sin: gullibility about precise predictive models, mostly elaborations around the preferential attachment of Chapter 14 – another form of Platonicity. I could not find anyone with depth and scientific technique who looked at the world of randomness and understood its nature, who looked at calculations as an aid, not a principal aim. It took me close to a decade and a half to find that thinker, the man who made many swans gray: Mandelbrot – the great Benoît Mandelbrot.

Chapter Sixteen: THE AESTHETICS OF RANDOMNESS

Mandelbrot’s library – Was Galileo blind? – Pearls to swine – Self-affinity – How the world can be complicated in a simple way, or, perhaps, simple in a very complicated way

THE POET OF RANDOMNESS

It was a melancholic afternoon when I smelled the old books in Benoît Mandelbrot’s library. This was on a hot day in August 2005, and the heat exacerbated the musty odor of the glue of old French books bringing on powerful olfactory nostalgia. I usually succeed in repressing such nostalgic excursions, but not when they sneak up on me as music or smell. The odor of Mandelbrot’s books was that of French literature, of my parents’ library, of the hours spent in bookstores and libraries when I was a teenager when many books around me were (alas) in French, when I thought that Literature was above anything and everything. (I haven’t been in contact with many French books since my teenage days.) However abstract I wanted it to be, Literature had a physical embodiment, it had a smell, and this was it.

The afternoon was also gloomy because Mandelbrot was moving away, exactly when I had become entitled to call him at crazy hours just because I had a question, such as why people didn’t realize that the 80/20 could be 50/01. Mandelbrot had decided to move to the Boston area, not to retire, but to work for a research center sponsored by a national laboratory. Since he was moving to an apartment in Cambridge, and leaving his oversize house in the Westchester suburbs of New York, he had invited me to come take my pick of his books.

Even the titles of the books had a nostalgic ring. I filled up a box with French titles, such as a 1949 copy of Henri Bergson’s Matière et mémoire, which it seemed Mandelbrot bought when he was a student (the smell!).

After having mentioned his name left and right throughout this book, I will finally introduce Mandelbrot, principally as the first person with an academic title with whom I ever spoke about randomness without feeling defrauded. Other mathematicians of probability would throw at me theorems with Russian names such as “Sobolev”, “Kolmogorov”, Wiener measure, without which they were lost; they had a hard time getting to the heart of the subject or exiting their little box long enough to consider its empirical flaws. With Mandelbrot, it was different: it was as if we both originated from the same country, meeting after years of frustrating exile, and were finally able to speak in our mother tongue without straining. He is the only flesh-and-bones teacher I ever had – my teachers are usually books in my library. I had way too little respect for mathematicians dealing with uncertainty and statistics to consider any of them my teachers – in my mind mathematicians, trained for certainties, had no business dealing with randomness. Mandelbrot proved me wrong.

He speaks an unusually precise and formal French, much like that spoken by Levantines of my parents’ generation or Old World aristocrats. This made it odd to hear, on occasion, his accented, but very standard, colloquial American English. He is tall, overweight, which makes him look baby-faced (although I’ve never seen him eat a large meal), and has a strong physical presence.

From the outside one would think that what Mandelbrot and I have in common is wild uncertainty, Black Swans, and dull (and sometimes less dull) statistical notions. But, although we are collaborators, this is not what our major conversations revolve around. It is mostly matters literary and aesthetic, or historical gossip about people of extraordinary intellectual refinement. I mean refinement, not achievement. Mandelbrot could tell stories about the phenomenal array of hotshots he has worked with over the past century, but somehow I am programmed to consider scientists’ personae far less interesting than those of colorful erudites. Like me, Mandelbrot takes an interest in urbane individuals who combine traits generally thought not to coexist together. One person he often mentions is Baron Pierre Jean de Menasce, whom he met at Princeton in the 1950s, where de Menasce was the roommate of the physicist Oppenheimer. De Menasce was exactly the kind of person I am interested in, the embodiment of a Black Swan. He came from an opulent Alexandrian Jewish merchant family, French and Italian-speaking like all sophisticated Levantines. His forebears had taken a Venetian spelling for their Arabic name, added a Hungarian noble title along the way, and socialized with royalty. De Menasce not only converted to Christianity, but became a Dominican priest and a great scholar of Semitic and Persian languages. Mandelbrot kept questioning me about Alexandria, since he was always looking for such characters.

True, intellectually sophisticated characters were exactly what I looked for in life. My erudite and polymathic father – who, were he still alive, would have only been two weeks older than Benoît M. – liked the company of extremely cultured Jesuit priests. I remember these Jesuit visitors occupying my chair at the dining table. I recall that one had a medical degree and a PhD in physics, yet taught Aramaic to locals in Beirut’s Institute of Eastern Languages. His previous assignment could have been teaching high school physics, and the one before that was perhaps in the medical school. This kind of erudition impressed my father far more than scientific assembly-line work. I may have something in my genes driving me away from bildungsphilisters.

Although Mandelbrot often expressed amazement at the temperament of high-flying erudites and remarkable but not-so-famous scientists, such as his old friend Carleton Gajdusek, a man who impressed him with his ability to uncover the causes of tropical diseases, he did not seem eager to trumpet his association with those we consider great scientists. It took me a while to discover that he had worked with an impressive list of scientists in seemingly every field, something a name-dropper would have brought up continuously. Although I have been working with him for a few years now, only the other day, as I was chatting with his wife, did I discover that he spent two years as the mathematical collaborator of the psychologist Jean Piaget. Another shock came when I discovered that he had also worked with the great historian Fernand Braudel, but Mandelbrot did not seem to be interested in Braudel. He did not care to discuss John von Neuman with whom he had worked as a postdoctoral fellow. His scale was inverted. I asked him once about Charles Tresser, an unknown physicist I met at a party who wrote papers on chaos theory and supplemented his researcher’s income by making pastry for a shop he ran near New York City. He was emphatic: “un homme extraordinaire”, he called Tresser, and could not stop praising him. But when I asked him about a particular famous hotshot, he replied, “He is the prototypical bon élève, a student with good grades, no depth, and no vision”. That hotshot was a Nobel laureate.

THE PLATONICITY OF TRIANGLES

Now, why am I calling this business Mandelbrotian, or fractal, randomness? Every single bit and piece of the puzzle has been previously mentioned by someone else, such as Pareto, Yule, and Zipf, but it was Mandelbrot who a) connected the dots, b) linked randomness to geometry (and a special brand at that), and c) took the subject to its natural conclusion. Indeed many mathematicians are famous today partly because he dug out their works to back up his claims – the strategy I am following here in this book. “I had to invent my predecessors, so people take me seriously”, he once told me, and he used the credibility of big guns as a rhetorical device. One can almost always ferret out predecessors for any thought. You can always find someone who worked on a part of your argument and use his contribution as your backup. The scientific association with a big idea, the “brand name”, goes to the one who connects the dots, not the one who makes a casual observation – even Charles Darwin, who uncultured scientists claim “invented” the survival of the fittest, was not the first to mention it. He wrote in the introduction of The Origin of Species that the facts he presented were not necessarily original; it was the consequences that he thought were “interesting” (as he put it with characteristic Victorian modesty). In the end it is those who derive consequences and seize the importance of the ideas, seeing their real value, who win the day. They are the ones who can talk about the subject. So let me describe Mandelbrotian geometry.

The Geometry of Nature

Triangles, squares, circles, and the other geometric concepts that made many of us yawn in the classroom may be beautiful and pure notions, but they seem more present in the minds of architects, design artists, modern art buildings, and schoolteachers than in nature itself. That’s fine, except that most of us aren’t aware of this. Mountains are not triangles or pyramids; trees are not circles; straight lines are almost never seen anywhere. Mother Nature did not attend high school geometry courses or read the books of Euclid of Alexandria. Her geometry is jagged, but with a logic of its own and one that is easy to understand.

I have said that we seem naturally inclined to Platonify, and to think exclusively in terms of studied material: nobody, whether a bricklayer or a natural philosopher, can easily escape the enslavement of such conditioning. Consider that the great Galileo, otherwise a debunker of falsehoods, wrote the following:

The great book of Nature lies ever open before our eyes and the true philosophy is written in it. … But we cannot read it unless we have first learned the language and the characters in which it is written. … It is written in mathematical language and the characters are triangles, circles and other geometric figures.

Was Galileo legally blind? Even the great Galileo, with all his alleged independence of mind, was not capable of taking a clean look at Mother Nature. I am confident that he had windows in his house and that he ventured outside from time to time: he should have known that triangles are not easily found in nature. We are so easily brainwashed.

We are either blind, or illiterate, or both. That nature’s geometry is not Euclid’s was so obvious, and nobody, almost nobody, saw it.

This (physical) blindness is identical to the ludic fallacy that makes us think casinos represent randomness.

Fractality

But first, a description of fractals. Then we will show how they link to what we call power laws, or scalable laws.

Fractal is a word Mandelbrot coined to describe the geometry of the rough and broken – from the Latin fractus, the origin of fractured. Fractality is the repetition of geometric patterns at different scales, revealing smaller and smaller versions of themselves. Small parts resemble, to some degree, the whole. I will try to show in this chapter how the fractal applies to the brand of uncertainty that should bear Mandelbrot’s name: Mandelbrotian randomness.

The veins in leaves look like branches; branches look like trees; rocks look like small mountains. There is no qualitative change when an object changes size. If you look at the coast of Britain from an airplane, it resembles what you see when you look at it with a magnifying glass. This character of self-affinity implies that one deceptively short and simple rule of iteration can be used, either by a computer or, more randomly, by Mother Nature, to build shapes of seemingly great complexity. This can come in handy for computer graphics, but, more important, it is how nature works. Mandelbrot designed the mathematical object now known as the Mandelbrot set, the most famous object in the history of mathematics. It became popular with followers of chaos theory because it generates pictures of ever increasing complexity by using a deceptively minuscule recursive rule; recursive means that something can be reapplied to itself infinitely. You can look at the set at smaller and smaller resolutions without ever reaching the limit; you will continue to see recognizable shapes. The shapes are never the same, yet they bear an affinity to one another, a strong family resemblance.

These objects play a role in aesthetics. Consider the following applications:

Visual arts: Most computer-generated objects are now based on some version of the Mandelbrotian fractal. We can also see fractals in architecture, paintings, and many works of visual art – of course, not consciously incorporated by the work’s creator.

Music: Slowly hum the four-note opening of Beethoven’s Fifth Symphony: ta-ta-ta-ta. Then replace each individual note with the same four-note opening, so that you end up with a measure of sixteen notes. You will see (or, rather, hear) that each smaller wave resembles the original larger one. Bach and Mahler, for instance, wrote submovements that resemble the larger movements of which they are a part.

Poetry: Emily Dickinson’s poetry, for instance, is fractal: the large resembles the small. It has, according to a commentator, “a consciously made assemblage of dictions, metres, rhetorics, gestures, and tones”.

Fractals initially made Benoît M. a pariah in the mathematical establishment. French mathematicians were horrified. What? Images? Mon dieu! It was like showing a porno movie to an assembly of devout Eastern Orthodox grandmothers in my ancestral village of Amioun. So Mandelbrot spent time as an intellectual refugee at an IBM research center in upstate New York. It was a f*** you money situation, as IBM let him do whatever he felt like doing.

But the general public (mostly computer geeks) got the point. Mandelbrot’s book The Fractal Geometry of Nature made a splash when it came out a quarter century ago. It spread through artistic circles and led to studies in aesthetics, architectural design, even large industrial applications. Benoît M. was even offered a position as a professor of medicine! Supposedly the lungs are self-similar. His talks were invaded by all sorts of artists, earning him the nickname the Rock Star of Mathematics. The computer age helped him become one of the most influential mathematicians in history, in terms of the applications of his work, way before his acceptance by the ivory tower. We will see that, in addition to its universality, his work offers an unusual attribute: it is remarkably easy to understand.

A few words on his biography. Mandelbrot came to France from Warsaw in 1936, at the age of twelve. Owing to the vicissitudes of a clandestine life during Nazi-occupied France, he was spared some of the conventional Gallic education with its uninspiring algebraic drills, becoming largely self-taught. He was later deeply influenced by his uncle Szolem, a prominent member of the French mathematical establishment and holder of a chair at the Collège de France. Benoît M. later settled in the United States, working most of his life as an industrial scientist, with a few transitory and varied academic appointments.

The computer played two roles in the new science Mandelbrot helped conceive. First, fractal objects, as we have seen, can be generated with a simple rule applied to itself, which makes them ideal for the automatic activity of a computer (or Mother Nature). Second, in the generation of visual intuitions lies a dialectic between the mathematician and the objects generated.

Now let us see how this takes us to randomness. In fact, it is with probability that Mandelbrot started his career.

A Visual Approach to Extremistan/Mediocristan

I am looking at the rug in my study. If I examine it with a microscope, I will see a very rugged terrain. If I look at it with a magnifying glass, the terrain will be smoother but still highly uneven. But when I look at it from a standing position, it appears uniform – it is almost as smooth as a sheet of paper. The rug at eye level corresponds to Mediocristan and the law of large numbers: I am seeing the sum of undulations, and these iron out. This is like Gaussian randomness: the reason my cup of coffee does not jump is that the sum of all of its moving particles becomes smooth. Likewise, you reach certainties by adding up small Gaussian uncertainties: this is the law of large numbers.

The Gaussian is not self-similar, and that is why my coffee cup does not jump on my desk.

Now, consider a trip up a mountain. No matter how high you go on the surface of the earth, it will remain jagged. This is even true at a height of 30,000 feet. When you are flying above the Alps, you will still see jagged mountains in place of small stones. So some surfaces are not from Mediocristan, and changing the resolution does not make them much smoother. (Note that this effect only disappears when you go up to more extreme heights. Our planet looks smooth to an observer from space, but this is because it is too small. If it were a bigger planet, then it would have mountains that would dwarf the Himalayas, and it would require observation from a greater distance for it to look smooth. Likewise, if the planet had a larger population, even maintaining the same average wealth, we would be likely to find someone whose net worth would vastly surpass that of Bill Gates.)

FIGURE 11

Apparently, a lens cap has been dropped on the ground. Now turn the page.

Figures 11 and 12 illustrate the above point: an observer looking at the first picture might think that a lens cap has fallen on the ground.

Recall our brief discussion of the coast of Britain. If you look at it from an airplane, its contours are not so different from the contours you see on the shore. The change in scaling does not alter the shapes or their degree of smoothness.

Pearls to Swine

What does fractal geometry have to do with the distribution of wealth, the size of cities, returns in the financial markets, the number of casualties in war, or the size of planets? Let us connect the dots.

FIGURE 12

The object is not in fact a lens cap. These two photos illustrate scale invariance: the terrain is fractal. Compare it to man-made objects such as a car or a house. Source: Professor Stephen W. Wheatcraft, University of Nevada Reno.

The key here is that the fractal has numerical or statistical measures that are (somewhat) preserved across scales – the ratio is the same, unlike the Gaussian. Another view of such self-similarity is presented in Figure 13. As we saw in Chapter 15, the superrich are similar to the rich, only richer – wealth is scale independent, or, more precisely, of unknown scale dependence.

FIGURE 13: THE PURE FRACTAL STATISTICAL MOUNTAIN

The degree of inequality will be the same in all sixteen subsections of the graph. In the Gaussian world, disparities in wealth (or any other quantity) decrease when you look at the upper end – so billionaires should be more equal in relation to one another than millionaires are, and millionaires more equal in relation to one another than the middle class. This lack of equality at all wealth levels, in a nutshell, is statistical self-similarity.

In the 1960s Mandelbrot presented his ideas on the prices of commodities and financial securities to the economics establishment, and the financial economists got all excited. In 1963 the then dean of the University of Chicago Graduate School of Business, George Shultz, offered him a professorship. This is the same George Shultz who later became Ronald Reagan’s secretary of state.

Shultz called him one evening to rescind the offer.

At the time of writing, forty-four years later, nothing has happened in economics and social science statistics – except for some cosmetic fiddling that treats the world as if we were subject only to mild randomness – and yet Nobel medals were being distributed. Some papers were written offering “evidence” that Mandelbrot was wrong by people who do not get the central argument of this book – you can always produce data “corroborating” that the underlying process is Gaussian by finding periods that do not have rare events, just like you can find an afternoon during which no one killed anyone and use it as “evidence” of honest behavior. I will repeat that, because of the asymmetry with induction, just as it is easier to reject innocence than accept it, it is easier to reject a bell curve than accept it; conversely, it is more difficult to reject a fractal than to accept it. Why? Because a single event can destroy the argument that we face a Gaussian bell curve.

In sum, four decades ago, Mandelbrot gave pearls to economists and résumé-building philistines, which they rejected because the ideas were too good for them. It was, as the saying goes, margaritas ante porcos, pearls before swine.


In the rest of this chapter I will explain how I can endorse Mandelbrotian fractals as a representation of much of randomness without necessarily accepting their precise use. Fractals should be the default, the approximation, the framework. They do not solve the Black Swan problem and do not turn all Black Swans into predictable events, but they significantly mitigate the Black Swan problem by making such large events conceivable. (It makes them gray. Why gray? Because only the Gaussian give you certainties. More on that, later.)

THE LOGIC OF FRACTAL RANDOMNESS (WITH A WARNING)[55]

I have shown in the wealth lists in Chapter 15 the logic of a fractal distribution: if wealth doubles from 1 million to 2 million, the incidence of people with at least that much money is cut in four, which is an exponent of two. If the exponent were one, then the incidence of that wealth or more would be cut in two. The exponent is called the “power” (which is why some people use the term power law). Let us call the number of occurrences higher than a certain level an “exceedance” – an exceedance of two million is the number of persons with wealth more than two million. One main property of these fractals (or another way to express their main property, scalability) is that the ratio of two exceedances[56] is going to be the ratio of the two numbers to the negative power of the power exponent.

Let us illustrate this. Say that you “think” that only 96 books a year will sell more than 250,000 copies (which is what happened last year), and that you “think” that the exponent is around 1.5. You can extrapolate to estimate that around 34 books will sell more than 500,000 copies – simply 96 times (500,000/250,000)-1.5. We can continue, and note that around 8 books should sell more than a million copies, here 96 times (1,000,000/250,000)-1.5.

Let me show the different measured exponents for a variety of phenomena.

TABLE 2: ASSUMED EXPONENTS FOR VARIOUS PHENOMENA[57]

Phenomenon Assumed Exponent (vague approximation)
Frequency of use of words 1.2
Number of hits on websites 1.4
Number of books sold in the U.S. 1.5
Telephone calls received 1.22
Magnitude of earthquakes 2.8
Diameter of moon craters 2.14
Intensity of solar flares 0.8
Intensity of wars 0.8
Net worth of Americans 1.1
Number of persons per family name 1
Population of U.S. cities 1.3
Markets moves 3 (or lower)
Company size 1.5
People killed in terrorists attacks 2 (but possibly much lower exponent)

Let me tell you upfront that these exponents mean very little in terms of numerical precision. We will see why in a minute, but just note for now that we do not observe these parameters; we simply guess them, or infer them for statistical information, which makes it hard at times to know the true parameters – if it in fact exists. Let us first examine the practical consequences of an exponent.

Table 3 illustrates the impact of the highly improbable. It shows the contributions of the top 1 percent and 20 percent to the total. The lower the exponent, the higher those contributions. But look how sensitive the process is: between 1.1 and 1.3 you go from 66 percent of the total to 34 percent. Just a 0.2 difference in the exponent changes the result dramatically – and such a difference can come from a simple measurement error. This difference is not trivial: just consider that we have no precise idea what the exponent is because we cannot measure it directly. All we do is estimate from past data or rely on theories that allow for the building of some model that would give us some idea – but these models may have hidden weaknesses that prevent us from blindly applying them to reality.

TABLE 3: THE MEANING OF THE EXPONENT

Exponent Share of the top 1% Share of the top 20%
1 99.99%[58] 99.99%
1.1 66% 86%
1.2 47% 76%
1.3 34% 69%
1.4 27% 63%
1.5 22% 58%
2 10% 45%
2.5 6% 38%
3 4.6% 34%

So keep in mind that the 1.5 exponent is an approximation, that it is hard to compute, that you do not get it from the gods, at least not easily, and that you will have a monstrous sampling error. You will observe that the number of books selling above a million copies is not always going to be 8 – It could be as high as 20, or as low as 2.

More significantly, this exponent begins to apply at some number called “crossover”, and addresses numbers larger than this crossover. It may start at 200,000 books, or perhaps only 400,000 books. Likewise, wealth has different properties before, say, $600 million, when inequality grows, than it does below such a number. How do you know where the crossover point is? This is a problem. My colleagues and I worked with around 20 million pieces of financial data. We all had the same data set, yet we never agreed on exactly what the exponent was in our sets. We knew the data revealed a fractal power law, but we learned that one could not produce a precise number. But what we did know – that the distribution is scalable and fractal – was sufficient for us to operate and make decisions.

The Problem of the Upper Bound

Some people have researched and accepted the fractal “up to a point”. They argue that wealth, book sales, and market returns all have a certain level when things stop being fractal. “Truncation” is what they propose. I agree that there is a level where fractality might stop, but where? Saying that there is an upper limit but I don’t know how high it is, and saying there is no limit carry the same consequences in practice. Proposing an upper limit is highly unsafe. You may say, Let us cap wealth at $150 billion in our analyses. Then someone else might say, Why not $151 billion? Or why not $152 billion? We might as well consider that the variable is unlimited.

Beware the Precision

I have learned a few tricks from experience: whichever exponent I try to measure will be likely to be overestimated (recall that a higher exponent implies a smaller role for large deviations) – what you see is likely to be less Black Swannish than what you do not see. I call this the masquerade problem.

Let’s say I generate a process that has an exponent of 1.7. You do not see what is inside the engine, only the data coming out. If I ask you what the exponent is, odds are that you will compute something like 2.4. You would do so even if you had a million data points. The reason is that it takes a long time for some fractal processes to reveal their properties, and you underestimate the severity of the shock.

Sometimes a fractal can make you believe that it is Gaussian, particularly when the cutpoint starts at a high number. With fractal distributions, extreme deviations of that kind are rare enough to smoke you: you don’t recognize the distribution as fractal.

The Water Puddle Revisited

As you have seen, we have trouble knowing the parameters of whichever model we assume runs the world. So with Extremistan, the problem of induction pops up again, this time even more significantly than at any previous time in this book. Simply, if a mechanism is fractal it can deliver large values; therefore the incidence of large deviations is possible, but how possible, how often they should occur, will be hard to know with any precision. This is similar to the water puddle problem: plenty of ice cubes could have generated it. As someone who goes from reality to possible explanatory models, I face a completely different spate of problems from those who do the opposite.

I have just read three “popular science” books that summarize the research in complex systems: Mark Buchanan’s Ubiquity, Philip Ball’s Critical Mass, and Paul Ormerod’s Why Most Things Fail. These three authors present the world of social science as full of power laws, a view with which I most certainly agree. They also claim that there is universality of many of these phenomena, that there is a wonderful similarity between various processes in nature and the behavior of social groups, which I agree with. They back their studies with the various theories on networks and show the wonderful correspondence between the so-called critical phenomena in natural science and the self-organization of social groups. They bring together processes that generate avalanches, social contagions, and what they call informational cascades, which I agree with.

Universality is one of the reasons physicists find power laws associated with critical points particularly interesting. There are many situations, both in dynamical systems theory and statistical mechanics, where many of the properties of the dynamics around critical points are independent of the details of the underlying dynamical system. The exponent at the critical point may be the same for many systems in the same group, even though many other aspects of the system are different. I almost agree with this notion of universality. Finally, all three authors encourage us to apply techniques from statistical physics, avoiding econometrics and Gaussian-style nonscalable distributions like the plague, and I couldn’t agree more.

But all three authors, by producing, or promoting precision, fall into the trap of not differentiating between the forward and the backward processes (between the problem and the inverse problem) – to me, the greatest scientific and epistemological sin. They are not alone; nearly everyone who works with data but doesn’t make decisions on the basis of these data tends to be guilty of the same sin, a variation of the narrative fallacy. In the absence of a feedback process you look at models and think that they confirm reality. I believe in the ideas of these three books, but not in the way they are being used – and certainly not with the precision the authors ascribe to them. As a matter of fact, complexity theory should make us more suspicious of scientific claims of precise models of reality. It does not make all the swans white; that is predictable: it makes them gray, and only gray.

As I have said earlier, the world, epistemologically, is literally a different place to a bottom-up empiricist. We don’t have the luxury of sitting down to read the equation that governs the universe; we just observe data and make an assumption about what the real process might be, and “calibrate” by adjusting our equation in accordance with additional information. As events present themselves to us, we compare what we see to what we expected to see. It is usually a humbling process, particularly for someone aware of the narrative fallacy, to discover that history runs forward, not backward. As much as one thinks that businessmen have big egos, these people are often humbled by reminders of the differences between decision and results, between precise models and reality.

What I am talking about is opacity, incompleteness of information, the invisibility of the generator of the world. History does not reveal its mind to us – we need to guess what’s inside of it.

From Representation to Reality

The above idea links all the parts of this book. While many study psychology, mathematics, or evolutionary theory and look for ways to take it to the bank by applying their ideas to business, I suggest the exact opposite: study the intense, uncharted, humbling uncertainty in the markets as a means to get insights about the nature of randomness that is applicable to psychology, probability, mathematics, decision theory, and even statistical physics. You will see the sneaky manifestations of the narrative fallacy, the ludic fallacy, and the great errors of Platonicity, of going from representation to reality.

When I first met Mandelbrot I asked him why an established scientist like him who should have more valuable things to do with his life would take an interest in such a vulgar topic as finance. I thought that finance and economics were just a place where one learned from various empirical phenomena and filled up one’s bank account with f*** you cash before leaving for bigger and better things. Mandelbrot’s answer was, “Data, a gold mine of data”. Indeed, everyone forgets that he started in economics before moving on to physics and the geometry of nature. Working with such abundant data humbles us; it provides the intuition of the following error: traveling the road between representation and reality in the wrong direction.

The problem of the circularity of statistics (which we can also call the statistical regress argument) is as follows. Say you need past data to discover whether a probability distribution is Gaussian, fractal, or something else. You will need to establish whether you have enough data to back up your claim. How do we know if we have enough data? From the probability distribution – a distribution does tell you whether you have enough data to “build confidence” about what you are inferring. If it is a Gaussian bell curve, then a few points will suffice (the law of large numbers once again). And how do you know if the distribution is Gaussian? Well, from the data. So we need the data to tell us what the probability distribution is, and a probability distribution to tell us how much data we need. This causes a severe regress argument.

This regress does not occur if you assume beforehand that the distribution is Gaussian. It happens that, for some reason, the Gaussian yields its properties rather easily. Extremistan distributions do not do so. So selecting the Gaussian while invoking some general law appears to be convenient. The Gaussian is used as a default distribution for that very reason. As I keep repeating, assuming its application beforehand may work with a small number of fields such as crime statistics, mortality rates, matters from Mediocristan. But not for historical data of unknown attributes and not for matters from Extremistan.

Now, why aren’t statisticians who work with historical data aware of this problem? First, they do not like to hear that their entire business has been canceled by the problem of induction. Second, they are not confronted with the results of their predictions in rigorous ways. As we saw with the Makridakis competition, they are grounded in the narrative fallacy, and they do not want to hear it.

ONCE AGAIN, BEWARE THE FORECASTERS

Let me take the problem one step higher up. As I mentioned earlier, plenty of fashionable models attempt to explain the genesis of Extremistan. In fact, they are grouped into two broad classes, but there are occasionally more approaches. The first class includes the simple rich-get-richer (or big-get-bigger) style model that is used to explain the lumping of people around cities, the market domination of Microsoft and VHS (instead of Apple and Betamax), the dynamics of academic reputations, etc. The second class concerns what are generally called “percolation models”, which address not the behavior of the individual, but rather the terrain in which he operates. When you pour water on a porous surface, the structure of that surface matters more than does the liquid. When a grain of sand hits a pile of other grains of sand, how the terrain is organized is what determines whether there will be an avalanche.

Most models, of course, attempt to be precisely predictive, not just descriptive; I find this infuriating. They are nice tools for illustrating the genesis of Extremistan, but I insist that the “generator” of reality does not appear to obey them closely enough to make them helpful in precise forecasting. At least to judge by anything you find in the current literature on the subject of Extremistan. Once again we face grave calibration problems, so it would be a great idea to avoid the common mistakes made while calibrating a nonlinear process. Recall that nonlinear processes have greater degrees of freedom than linear ones (as we saw in Chapter 11), with the implication that you run a great risk of using the wrong model. Yet once in a while you run into a book or articles advocating the application of models from statistical physics to reality. Beautiful books like Philip Ball’s illustrate and inform, but they should not lead to precise quantitative models. Do not take them at face value.

But let us see what we can take home from these models.

Once Again, a Happy Solution

First, in assuming a scalable, I accept that an arbitrarily large number is possible. In other words, inequalities should not stop above some known maximum bound.

Say that the book The Da Vinci Code sold around 60 million copies. (The Bible sold about a billion copies but let’s ignore it and limit our analysis to lay books written by individual authors.) Although we have never known a lay book to sell 200 million copies, we can consider that the possibility is not zero. It’s small, but it’s not zero. For every three Da Vinci Code-style bestsellers, there might be one superbestseller, and though one has not happened so far, we cannot rule it out. And for every fifteen Da Vinci Codes there will be one superbestseller selling, say, 500 million copies.

Apply the same logic to wealth. Say the richest person on earth is worth $50 billion. There is a nonnegligible probability that next year someone with $100 billion or more will pop out of nowhere. For every three people with more than $50 billion, there could be one with $100 billion or more. There is a much smaller probability of there being someone with more than $200 billion – one third of the previous probability, but nevertheless not zero. There is even a minute, but not zero probability of there being someone worth more than $500 billion.

This tells me the following: I can make inferences about things that I do not see in my data, but these things should still belong to the realm of possibilities. There is an invisible bestseller out there, one that is absent from the past data but that you need to account for. Recall my point in Chapter 13: it makes investment in a book or a drug better than statistics on past data might suggest. But it can make stock market losses worse than what the past shows.

Wars are fractal in nature. A war that kills more people than the devastating Second World War is possible – not likely, but not a zero probability, although such a war has never happened in the past.

Second, I will introduce an illustration from nature that will help to make the point about precision. A mountain is somewhat similar to a stone: it has an affinity with a stone, a family resemblance, but it is not identical. The word to describe such resemblances is self-affine, not the precise self-similar, but Mandelbrot had trouble communicating the notion of affinity, and the term self-similar spread with its connotation of precise resemblance rather than family resemblance. As with the mountain and the stone, the distribution of wealth above $1 billion is not exactly the same as that below $1 billion, but the two distributions have “affinity”.

Third, I said earlier that there have been plenty of papers in the world of econophysics (the application of statistical physics to social and economic phenomena) aiming at such calibration, at pulling numbers from the world of phenomena. Many try to be predictive. Alas, we are not able to predict “transitions” into crises or contagions. My friend Didier Sornette attempts to build predictive models, which I love, except that I cannot use them to make predictions – but please don’t tell him; he might stop building them. That I can’t use them as he intends does not invalidate his work, it just makes the interpretations require broad-minded thinking, unlike models in conventional economics that are fundamentally flawed. We may be able to do well with some of Sornette’s phenomena, but not all.

WHERE IS THE GRAY SWAN?

I have written this entire book about the Black Swan. This is not because I am in love with the Black Swan; as a humanist, I hate it. I hate most of the unfairness and damage it causes. Thus I would like to eliminate many Black Swans, or at least to mitigate their effects and be protected from them. Fractal randomness is a way to reduce these surprises, to make some of the swans appear possible, so to speak, to make us aware of their consequences, to make them gray. But fractal randomness does not yield precise answers. The benefits are as follows. If you know that the stock market can crash, as it did in 1987, then such an event is not a Black Swan. The crash of 1987 is not an outlier if you use a fractal with an exponent of three. If you know that biotech companies can deliver a megablockbuster drug, bigger than all we’ve had so far, then it won’t be a Black Swan, and you will not be surprised, should that drug appear.

Thus Mandelbrot’s fractals allow us to account for a few Black Swans, but not all. I said earlier that some Black Swans arise because we ignore sources of randomness. Others arise when we overestimate the fractal exponent. A gray swan concerns modelable extreme events, a black swan is about unknown unknowns.

I sat down and discussed this with the great man, and it became, as usual, a linguistic game. In Chapter 9 I presented the distinction economists make between Knightian uncertainty (incomputable) and Knightian risk (computable); this distinction cannot be so original an idea to be absent in our vocabulary, and so we looked for it in French. Mandelbrot mentioned one of his friends and prototypical heroes, the aristocratic mathematician Marcel-Paul Schützenberger, a fine erudite who (like this author) was easily bored and could not work on problems beyond their point of diminishing returns. Schützenberger insisted on the clear-cut distinction in the French language between hasard and fortuit. Hasard, from the Arabic az-zahr, implies, like alea, dice – tractable randomness; fortuit is my Black Swan – the purely accidental and unforeseen. We went to the Petit Robert dictionary; the distinction effectively exists there. Fortuit seems to correspond to my epistemic opacity, l’imprévu et non quantifiable; hasard to the more ludic type of uncertainty that was proposed by the Chevalier de Méré in the early gambling literature. Remarkably, the Arabs may have introduced another word to the business of uncertainty: rizk, meaning property.

I repeat: Mandelbrot deals with gray swans; I deal with the Black Swan. So Mandelbrot domesticated many of my Black Swans, but not all of them, not completely. But he shows us a glimmer of hope with his method, a way to start thinking about the problems of uncertainty. You are indeed much safer if you know where the wild animals are.

Chapter Seventeen: LOCKE’S MADMEN, OR BELL CURVES IN THE WRONG PLACES[59]

What? – Anyone can become president – Alfred Nobel’s legacy – Those medieval days

I have in my house two studies: one real, with interesting books and literary material; the other nonliterary, where I do not enjoy working, where I relegate matters prosaic and narrowly focused. In the nonliterary study is a wall full of books on statistics and the history of statistics, books I never had the fortitude to burn or throw away; though I find them largely useless outside of their academic applications (Carneades, Cicero, and Foucher know a lot more about probability than all these pseudosophisticated volumes). I cannot use them in class because I promised myself never to teach trash, even if dying of starvation. Why can’t I use them? Not one of these books deals with Extremistan. Not one. The few books that do are not by statisticians but by statistical physicists. We are teaching people methods from Mediocristan and turning them loose in Extremistan. It is like developing a medicine for plants and applying it to humans. It is no wonder that we run the biggest risk of all: we handle matters that belong to Extremistan, but treated as if they belonged to Mediocristan, as an “approximation”.

Several hundred thousand students in business schools and social science departments from Singapore to Urbana-Champaign, as well as people in the business world, continue to study “scientific” methods, all grounded in the Gaussian, all embedded in the ludic fallacy.

This chapter examines disasters stemming from the application of phony mathematics to social science. The real topic might be the dangers to our society brought about by the Swedish academy that awards the Nobel Prize.

Only Fifty Years

Let us return to the story of my business life. Look at the graph in Figure 14. In the last fifty years, the ten most extreme days in the financial markets represent half the returns. Ten days in fifty years. Meanwhile, we are mired in chitchat.


FIGURE 14

By removing the ten biggest one-day moves from the U.S. stock market over the past fifty years, we see a huge difference in returns – and yet conventional finance sees these one-day jumps as mere anomalies. (This is only one of many such tests. While it is quite convincing on a casual read, there are many more-convincing ones from a mathematical standpoint, such as the incidence of 10 sigma events.)

Clearly, anyone who wants more than the high number of six sigma as proof that markets are from Extremistan needs to have his head examined. Dozens of papers show the inadequacy of the Gaussian family of distributions and the scalable nature of markets. Recall that, over the years, I myself have run statistics backward and forward on 20 million pieces of data that made me despise anyone talking about markets in Gaussian terms. But people have a hard time making the leap to the consequences of this knowledge.

The strangest thing is that people in business usually agree with me when they listen to me talk or hear me make my case. But when they go to the office the next day they revert to the Gaussian tools so entrenched in their habits. Their minds are domain-dependent, so they can exercise critical thinking at a conference while not doing so in the office. Furthermore, the Gaussian tools give them numbers, which seem to be “better than nothing”. The resulting measure of future uncertainty satisfies our ingrained desire to simplify even if that means squeezing into one single number matters that are too rich to be described that way.

The Clerks’ Betrayal

I ended Chapter 1 with the stock market crash of 1987, which allowed me to aggressively pursue my Black Swan idea. Right after the crash, when I stated that those using sigmas (i.e., standard deviations) as a measure of the degree of risk and randomness were charlatans, everyone agreed with me. If the world of finance were Gaussian, an episode such as the crash (more than twenty standard deviations) would take place every several billion lifetimes of the universe (look at the height example in Chapter 15). According to the circumstances of 1987, people accepted that rare events take place and are the main source of uncertainty. They were just unwilling to give up on the Gaussian as a central measurement tool – “Hey, we have nothing else”. People want a number to anchor on. Yet the two methods are logically incompatible.

Unbeknownst to me, 1987 was not the first time the idea of the Gaussian was shown to be lunacy. Mandelbrot proposed the scalable to the economics establishment around 1960, and showed them how the Gaussian curve did not fit prices then. But after they got over their excitement, they realized that they would have to relearn their trade. One of the influential economists of the day, the late Paul Cootner, wrote, “Mandelbrot, like Prime Minister Churchill before him, promised us not Utopia, but blood, sweat, toil, and tears. If he is right, almost all our statistical tools are obsolete [or] meaningless”. I propose two corrections to Cootner’s statement. First, I would replace almost all with all. Second, I disagree with the blood and sweat business. I find Mandelbrot’s randomness considerably easier to understand than the conventional statistics. If you come fresh to the business, do not rely on the old theoretical tools, and do not have a high expectation of certainty.

Anyone Can Become President

And now a brief history of the “Nobel” Prize in economics, which was established by the Bank of Sweden in honor of Alfred Nobel, who may be, according to his family who wants the prize abolished, now rolling in his grave with disgust. An activist family member calls the prize a public relations coup by economists aiming to put their field on a higher footing than it deserves. True, the prize has gone to some valuable thinkers, such as the empirical psychologist Daniel Kahneman and the thinking economist Friedrich Hayek. But the committee has gotten into the habit of handing out Nobel Prizes to those who “bring rigor” to the process with pseudo-science and phony mathematics. After the stock market crash, they rewarded two theoreticians, Harry Markowitz and William Sharpe, who built beautifully Platonic models on a Gaussian base, contributing to what is called Modern Portfolio Theory. Simply, if you remove their Gaussian assumptions and treat prices as scalable, you are left with hot air. The Nobel Committee could have tested the Sharpe and Markowitz models – they work like quack remedies sold on the Internet – but nobody in Stockholm seems to have thought of it. Nor did the committee come to us practitioners to ask us our opinions; instead it relied on an academic vetting process that, in some disciplines, can be corrupt all the way to the marrow. After that award I made a prediction: “In a world in which these two get the Nobel, anything can happen. Anyone can become president”.

So the Bank of Sweden and the Nobel Academy are largely responsible for giving credence to the use of the Gaussian Modern Portfolio Theory as institutions have found it a great cover-your-behind approach. Software vendors have sold “Nobel crowned” methods for millions of dollars. How could you go wrong using it? Oddly enough, everyone in the business world initially knew that the idea was a fraud, but people get used to such methods. Alan Greenspan, the chairman of the Federal Reserve bank, supposedly blurted out, “I’d rather have the opinion of a trader than a mathematician”. Meanwhile, the Modern Portfolio Theory started spreading. I will repeat the following until I am hoarse: it is contagion that determines the fate of a theory in social science, not its validity.

I only realized later that Gaussian-trained finance professors were taking over business schools, and therefore MBA programs, and producing close to a hundred thousand students a year in the United States alone, all brainwashed by a phony portfolio theory. No empirical observation could halt the epidemic. It seemed better to teach students a theory based on the Gaussian than to teach them no theory at all. It looked more “scientific” than giving them what Robert C. Merton (the son of the sociologist Robert K. Merton we discussed earlier) called the “anecdote”. Merton wrote that before portfolio theory, finance was “a collection of anecdotes, rules of thumb, and manipulation of accounting data”. Portfolio theory allowed “the subsequent evolution from this conceptual potpourri to a rigorous economic theory”. For a sense of the degree of intellectual seriousness involved, and to compare neoclassical economics to a more honest science, consider this statement from the nineteenth-century father of modern medicine, Claude Bernard: “Facts for now, but with scientific aspirations for later”. You should send economists to medical school.

So the Gaussian[60] pervaded our business and scientific cultures, and terms such as sigma, variance, standard deviation, correlation, R square, and the eponymous Sharpe ratio, all directly linked to it, pervaded the lingo. If you read a mutual fund prospectus, or a description of a hedge fund’s exposure, odds are that it will supply you, among other information, with some quantitative summary claiming to measure “risk”. That measure will be based on one of the above buzzwords derived from the bell curve and its kin. Today, for instance, pension funds’ investment policy and choice of funds are vetted by “consultants” who rely on portfolio theory. If there is a problem, they can claim that they relied on standard scientific method.

More Horror

Things got a lot worse in 1997. The Swedish academy gave another round of Gaussian-based Nobel Prizes to Myron Scholes and Robert C. Merton, who had improved on an old mathematical formula and made it compatible with the existing grand Gaussian general financial equilibrium theories – hence acceptable to the economics establishment. The formula was now “useable”. It had a list of long forgotten “precursors”, among whom was the mathematician and gambler Ed Thorp, who had authored the bestselling Beat the Dealer, about how to get ahead in blackjack, but somehow people believe that Scholes and Merton invented it, when in fact they just made it acceptable. The formula was my bread and butter. Traders, bottom-up people, know its wrinkles better than academics by dint of spending their nights worrying about their risks, except that few of them could express their ideas in technical terms, so I felt I was representing them. Scholes and Merton made the formula dependent on the Gaussian, but their “precursors” subjected it to no such restriction.[61]

The postcrash years were entertaining for me, intellectually. I attended conferences in finance and mathematics of uncertainty; not once did I find a speaker, Nobel or no Nobel, who understood what he was talking about when it came to probability, so I could freak them out with my questions. They did “deep work in mathematics”, but when you asked them where they got their probabilities, their explanations made it clear that they had fallen for the ludic fallacy – there was a strange cohabitation of technical skills and absence of understanding that you find in idiot savants. Not once did I get an intelligent answer or one that was not ad hominem. Since I was questioning their entire business, it was understandable that I drew all manner of insults: “obsessive”, “commercial”, “philosophical”, “essayist”, “idle man of leisure”, “repetitive”, “practitioner” (this is an insult in academia), “academic” (this is an insult in business). Being on the receiving end of angry insults is not that bad; you can get quickly used to it and focus on what is not said. Pit traders are trained to handle angry rants. If you work in the chaotic pits, someone in a particularly bad mood from losing money might start cursing at you until he injures his vocal cords, then forget about it and, an hour later, invite you to his Christmas party. So you become numb to insults, particularly if you teach yourself to imagine that the person uttering them is a variant of a noisy ape with little personal control. Just keep your composure, smile, focus on analyzing the speaker not the message, and you’ll win the argument. An ad hominem attack against an intellectual, not against an idea, is highly flattering. It indicates that the person does not have anything intelligent to say about your message.

The psychologist Philip Tetlock (the expert buster in Chapter 10), after listening to one of my talks, reported that he was struck by the presence of an acute state of cognitive dissonance in the audience. But how people resolve this cognitive tension, as it strikes at the core of everything they have been taught and at the methods they practice, and realize that they will continue to practice, can vary a lot. It was symptomatic that almost all people who attacked my thinking attacked a deformed version of it, like “it is all random and unpredictable” rather than “it is largely random”, or got mixed up by showing me how the bell curve works in some physical domains. Some even had to change my biography. At a panel in Lugano, Myron Scholes once got in to a state of rage, and went after a transformed version of my ideas. I could see pain in his face. Once, in Paris, a prominent member of the mathematical establishment, who invested part of his life on some minute sub-sub-property of the Gaussian, blew a fuse – right when I showed empirical evidence of the role of Black Swans in markets. He turned red with anger, had difficulty breathing, and started hurling insults at me for having desecrated the institution, lacking pudeur (modesty); he shouted “I am a member of the Academy of Science!” to give more strength to his insults. (The French translation of my book was out of stock the next day.) My best episode was when Steve Ross, an economist perceived to be an intellectual far superior to Scholes and Merton, and deemed a formidable debater, gave a rebuttal to my ideas by signaling small errors or approximations in my presentation, such as “Markowitz was not the first to …” thus certifying that he had no answer to my main point. Others who had invested much of their lives in these ideas resorted to vandalism on the Web. Economists often invoke a strange argument by Milton Friedman that states that models do not have to have realistic assumptions to be acceptable – giving them license to produce severely defective mathematical representations of reality. The problem of course is that these Gaussianizations do not have realistic assumptions and do not produce reliable results. They are neither realistic nor predictive. Also note a mental bias I encounter on the occasion: people mistake an event with a small probability, say, one in twenty years for a periodically occurring one. They think that they are safe if they are only exposed to it for ten years.

I had trouble getting the message about the difference between Mediocristan and Extremistan through – many arguments presented to me were about how society has done well with the bell curve – just look at credit bureaus, etc.

The only comment I found unacceptable was, “You are right; we need you to remind us of the weakness of these methods, but you cannot throw the baby out with the bath water”, meaning that I needed to accept their reductive Gaussian distribution while also accepting that large deviations could occur – they didn’t realize the incompatibility of the two approaches. It was as if one could be half dead. Not one of these users of portfolio theory in twenty years of debates, explained how they could accept the Gaussian framework as well as large deviations. Not one.

Confirmation

Along the way I saw enough of the confirmation error to make Karl Popper stand up with rage. People would find data in which there were no jumps or extreme events, and show me a “proof” that one could use the Gaussian. This was exactly like my example of the “proof” that O.J. Simpson is not a killer in Chapter 5. The entire statistical business confused absence of proof with proof of absence. Furthermore, people did not understand the elementary asymmetry involved: you need one single observation to reject the Gaussian, but millions of observations will not fully confirm the validity of its application. Why? Because the Gaussian bell curve disallows large deviations, but tools of Extremistan, the alternative, do not disallow long quiet stretches.

I did not know that Mandelbrot’s work mattered outside aesthetics and geometry. Unlike him, I was not ostracized: I got a lot of approval from practitioners and decision makers, though not from their research staffs.

But suddenly I got the most unexpected vindication.

IT WAS JUST A BLACK SWAN

Robert Merton, Jr., and Myron Scholes were founding partners in the large speculative trading firm called Long-Term Capital Management, or LTCM, which I mentioned in Chapter 4. It was a collection of people with top-notch résumés, from the highest ranks of academia. They were considered geniuses. The ideas of portfolio theory inspired their risk management of possible outcomes – thanks to their sophisticated “calculations”. They managed to enlarge the ludic fallacy to industrial proportions.

Then, during the summer of 1998, a combination of large events, triggered by a Russian financial crisis, took place that lay outside their models. It was a Black Swan. LTCM went bust and almost took down the entire financial system with it, as the exposures were massive. Since their models ruled out the possibility of large deviations, they allowed themselves to take a monstrous amount of risk. The ideas of Merton and Scholes, as well as those of Modern Portfolio Theory, were starting to go bust. The magnitude of the losses was spectacular, too spectacular to allow us to ignore the intellectual comedy. Many friends and I thought that the portfolio theorists would suffer the fate of tobacco companies: they were endangering people’s savings and would soon be brought to account for the consequences of their Gaussian-inspired methods.

None of that happened.

Instead, MBAs in business schools went on learning portfolio theory. And the option formula went on bearing the name Black-Scholes-Merton, instead of reverting to its true owners, Louis Bachelier, Ed Thorp, and others.

How to “Prove” Things

Merton the younger is a representative of the school of neoclassical economics, which, as we have seen with LTCM, represents most powerfully the dangers of Platonified knowledge.[62] Looking at his methodology, I see the following pattern. He starts with rigidly Platonic assumptions, completely unrealistic – such as the Gaussian probabilities, along with many more equally disturbing ones. Then he generates “theorems” and “proofs” from these. The math is tight and elegant. The theorems are compatible with other theorems from Modern Portfolio Theory, themselves compatible with still other theorems, building a grand theory of how people consume, save, face uncertainty, spend, and project the future. He assumes that we know the likelihood of events. The beastly word equilibrium is always present. But the whole edifice is like a game that is entirely closed, like Monopoly with all of its rules.

A scholar who applies such methodology resembles Locke’s definition of a madman: someone “reasoning correctly from erroneous premises”.

Now, elegant mathematics has this property: it is perfectly right, not 99 percent so. This property appeals to mechanistic minds who do not want to deal with ambiguities. Unfortunately you have to cheat somewhere to make the world fit perfect mathematics; and you have to fudge your assumptions somewhere. We have seen with the Hardy quote that professional “pure” mathematicians, however, are as honest as they come.

So where matters get confusing is when someone like Merton tries to be mathematical and airtight rather than focus on fitness to reality.

This is where you learn from the minds of military people and those who have responsibilities in security. They do not care about “perfect” ludic reasoning; they want realistic ecological assumptions. In the end, they care about lives.

I mentioned in Chapter 11 how those who started the game of “formal thinking”, by manufacturing phony premises in order to generate “rigorous” theories, were Paul Samuelson, Merton’s tutor, and, in the United Kingdom, John Hicks. These two wrecked the ideas of John Maynard Keynes, which they tried to formalize (Keynes was interested in uncertainty, and complained about the mind-closing certainties induced by models). Other participants in the formal thinking venture were Kenneth Arrow and Gerard Debreu. All four were Nobeled. All four were in a delusional state under the effect of mathematics – what Dieudonné called “the music of reason”, and what I call Locke’s madness. All of them can be safely accused of having invented an imaginary world, one that lent itself to their mathematics. The insightful scholar Martin Shubik, who held that the degree of excessive abstraction of these models, a few steps beyond necessity, makes them totally unusable, found himself ostracized, a common fate for dissenters.[63]

If you question what they do, as I did with Merton Jr., they will ask for “tight proof”. So they set the rules of the game, and you need to play by them. Coming from a practitioner background in which the principal asset is being able to work with messy, but empirically acceptable, mathematics.

I cannot accept a pretense of science. I much prefer a sophisticated craft, focused on tricks, to a failed science looking for certainties. Or could these neoclassical model builders be doing something worse? Could it be that they are involved in what Bishop Huet calls the manufacturing of certainties?

Let us see.

Skeptical empiricism advocates the opposite method. I care about the premises more than the theories, and I want to minimize reliance on theories, stay light on my feet, and reduce my surprises. I want to be broadly right rather than precisely wrong. Elegance in the theories is often indicative of Platonicity and weakness – it invites you to seek elegance for elegance’s sake. A theory is like medicine (or government): often useless, sometimes necessary, always self-serving, and on occasion lethal. So it needs to be used with care, moderation, and close adult supervision.

TABLE 4: TWO WAYS TO APPROACH RANDOMNESS

Skeptical Empiricism and the a-Platonic School The Platonic Approach
Interested in what lies outside the Platonic fold Focuses on the inside of the Platonic fold
Respect for those who have the guts to say “I don’t know” “You keep criticizing these models. These models are all we have”.
Fat Tony Dr. John
Thinks of Black Swans as a dominant source of randomness Thinks of ordinary fluctuations as a dominant source of randomness, with jumps as an afterthought
Bottom-up Top-down
Would ordinarily not wear suits (except to funerals) Wears dark suits, white shirts; speaks in a boring tone
Prefers to be broadly right Precisely wrong
Minimal theory, consides theorizing as a disease to resist Everything needs to fit some grand, general socioeconomic model and “the rigor of economic theory”; frowns on the “descriptive”
Does not believe that we can easily compute probabilities Built their entire apparatus on the assumptions that we can compute probabilities
Model: Sextus Empiricus and the school of evidence-based, minimum-theory empirical medicine Model: Laplacian mechanics, the world and the economy like a clock
Develops intuitions from practice, goes from observations to books Relies on scientific papers, goes from books to practice
Not inspired by any science, uses messy mathematics and computational methods Inspired by physics, relies on abstract mathematics
Ideas based on skepticism, on the unread books in the library Ideas based on beliefs, on what they think they know
Assumes Extremistan as a starting point Assumes Mediocristan as a starting point
Sophisticated craft Poor science
Seeks to be approximately right across a broad set of eventualities Seeks to be perfectly right in a narrow model, under precise assumptions

The distinction in the above table between my model modern, skeptical empiricist and what Samuelson’s puppies represent can be generalized across disciplines.


I’ve presented my ideas in finance because that’s where I refined them. Let us now examine a category of people expected to be more thoughtful: the philosophers.

Chapter Eighteen: THE UNCERTAINTY OF THE PHONY

Philosophers in the wrong places – Uncertainty about (mostly) lunch – What I don’t care about – Education and intelligence

This final chapter of Part Three focuses on a major ramification of the ludic fallacy: how those whose job it is to make us aware of uncertainty fail us and divert us into bogus certainties through the back door.

LUDIC FALLACY REDUX

I have explained the ludic fallacy with the casino story, and have insisted that the sterilized randomness of games does not resemble randomness in real life. Look again at Figure 7 in Chapter 15. The dice average out so quickly that I can say with certainty that the casino will beat me in the very near long run at, say, roulette, as the noise will cancel out, though not the skills (here, the casino’s advantage). The more you extend the period (or reduce the size of the bets) the more randomness, by virtue of averaging, drops out of these gambling constructs.

The ludic fallacy is present in the following chance setups: random walk, dice throwing, coin flipping, the infamous digital “heads or tails” expressed as 0 or 1, Brownian motion (which corresponds to the movement of pollen particles in water), and similar examples. These setups generate a quality of randomness that does not even qualify as randomness – protorandomness would be a more appropriate designation. At their core, all theories built around the ludic fallacy ignore a layer of uncertainty. Worse, their proponents do not know it!

One severe application of such focus on small, as opposed to large, uncertainty concerns the hackneyed greater uncertainty principle.

Find the Phony

The greater uncertainty principle states that in quantum physics, one cannot measure certain pairs of values (with arbitrary precision), such as the position and momentum of particles. You will hit a lower bound of measurement: what you gain in the precision of one, you lose in the other. So there is an incompressible uncertainty that, in theory, will defy science and forever remain an uncertainty. This minimum uncertainty was discovered by Werner Heisenberg in 1927. I find it ludicrous to present the uncertainty principle as having anything to do with uncertainty. Why? First, this uncertainty is Gaussian. On average, it will disappear – recall that no one person’s weight will significantly change the total weight of a thousand people. We may always remain uncertain about the future positions of small particles, but these uncertainties are very small and very numerous, and they average out – for Pluto’s sake, they average out! They obey the law of large numbers we discussed in Chapter 15. Most other types of randomness do not average out! If there is one thing on this planet that is not so uncertain, it is the behavior of a collection of subatomic particles! Why? Because, as I have said earlier, when you look at an object, composed of a collection of particles, the fluctuations of the particles tend to balance out.

But political, social, and weather events do not have this handy property, and we patently cannot predict them, so when you hear “experts” presenting the problems of uncertainty in terms of subatomic particles, odds are that the expert is a phony. As a matter of fact, this may be the best way to spot a phony.

I often hear people say, “Of course there are limits to our knowledge”, then invoke the greater uncertainty principle as they try to explain that “we cannot model everything” – I have heard such types as the economist Myron Scholes say this at conferences. But I am sitting here in New York, in August 2006, trying to go to my ancestral village of Amioun, Lebanon. Beirut’s airport is closed owing to the conflict between Israel and the Shiite militia Hezbollah. There is no published airline schedule that will inform me when the war will end, if it ends. I can’t figure out if my house will be standing, if Amioun will still be on the map – recall that the family house was destroyed once before. I can’t figure out whether the war is going to degenerate into something even more severe. Looking into the outcome of the war, with all my relatives, friends, and property exposed to it, I face true limits of knowledge. Can someone explain to me why I should care about subatomic particles that, anyway, converge to a Gaussian? People can’t predict how long they will be happy with recently acquired objects, how long their marriages will last, how their new jobs will turn out, yet it’s subatomic particles that they cite as “limits of prediction”. They’re ignoring a mammoth standing in front of them in favor of matter even a microscope would not allow them to see.

Can Philosophers Be Dangerous to Society?

I will go further: people who worry about pennies instead of dollars can be dangerous to society. They mean well, but, invoking my Bastiat argument of Chapter 8, they are a threat to us. They are wasting our studies of uncertainty by focusing on the insignificant. Our resources (both cognitive and scientific) are limited, perhaps too limited. Those who distract us increase the risk of Black Swans.

This commoditization of the notion of uncertainty as symptomatic of Black Swan blindness is worth discussing further here.

Given that people in finance and economics are seeped in the Gaussian to the point of choking on it, I looked for financial economists with philosophical bents to see how their critical thinking allows them to handle this problem. I found a few. One such person got a PhD in philosophy, then, four years later, another in finance; he published papers in both fields, as well as numerous textbooks in finance. But I was disheartened by him: he seemed to have compartmentalized his ideas on uncertainty so that he had two distinct professions: philosophy and quantitative finance. The problem of induction, Mediocristan, epistemic opacity, or the offensive assumption of the Gaussian – these did not hit him as true problems. His numerous textbooks drilled Gaussian methods into students’ heads, as though their author had forgotten that he was a philosopher. Then he promptly remembered that he was when writing philosophy texts on seemingly scholarly matters.

The same context specificity leads people to take the escalator to the StairMasters, but the philosopher’s case is far, far more dangerous since he uses up our storage for critical thinking in a sterile occupation. Philosophers like to practice philosophical thinking on me-too subjects that other philosophers call philosophy, and they leave their minds at the door when they are outside of these subjects.

The Problem of Practice

As much as I rail against the bell curve, Platonicity, and the ludic fallacy, my principal problem is not so much with statisticians – after all, these are computing people, not thinkers. We should be far less tolerant of philosophers, with their bureaucratic apparatchiks closing our minds. Philosophers, the watchdogs of critical thinking, have duties beyond those of other professions.

HOW MANY WITTGENSTEINS CAN DANCE ON THE HEAD OF A PIN?

A number of semishabbily dressed (but thoughtful-looking) people gather in a room, silently looking at a guest speaker. They are all professional philosophers attending the prestigious weekly colloquium at a New York-area university. The speaker sits with his nose drowned in a set of typewritten pages, from which he reads in a monotone voice. He is hard to follow, so I daydream a bit and lose his thread. I can vaguely tell that the discussion revolves around some “philosophical” debate about Martians invading your head and controlling your will, all the while preventing you from knowing it. There seem to be several theories concerning this idea, but the speaker’s opinion differs from those of other writers on the subject. He spends some time showing where his research on these head-hijacking Martians is unique. After his monologue (fifty-five minutes of relentless reading of the typewritten material) there is a short break, then another fifty-five minutes of discussion about Martians planting chips and other outlandish conjectures. Wittgenstein is occasionally mentioned (you can always mention Wittgenstein since he is vague enough to always seem relevant).

Every Friday, at four P.M., the paychecks of these philosophers will hit their respective bank accounts. A fixed proportion of their earnings, about 16 percent on average, will go into the stock market in the form of an automatic investment into the university’s pension plan. These people are professionally employed in the business of questioning what we take for granted; they are trained to argue about the existence of god(s), the definition of truth, the redness of red, the meaning of meaning, the difference between the semantic theories of truth, conceptual and nonconceptual representations … Yet they believe blindly in the stock market, and in the abilities of their pension plan manager. Why do they do so? Because they accept that this is what people should do with their savings, because “experts” tell them so. They doubt their own senses, but not for a second do they doubt their automatic purchases in the stock market. This domain dependence of skepticism is no different from that of medical doctors (as we saw in Chapter 8).

Beyond this, they may believe without question that we can predict societal events, that the Gulag will toughen you a bit, that politicians know more about what is going on than their drivers, that the chairman of the Federal Reserve saved the economy, and so many such things. They may also believe that nationality matters (they always stick “French”, “German”, or “American” in front of a philosopher’s name, as if this has something to do with anything he has to say). Spending time with these people, whose curiosity is focused on regimented on-the-shelf topics, feels stifling.

Where Is Popper When You Need Him?

I hope I’ve sufficiently drilled home the notion that, as a practitioner, my thinking is rooted in the belief that you cannot go from books to problems, but the reverse, from problems to books. This approach incapacitates much of that career-building verbiage. A scholar should not be a library’s tool for making another library, as in the joke by Daniel Dennett. Of course, what I am saying here has been said by philosophers before, at least by the real ones. The following remark is one reason I have inordinate respect for Karl Popper; it is one of the few quotations in this book that I am not attacking.

The degeneration of philosophical schools in its turn is the consequence of the mistaken belief that one can philosophize without having been compelled to philosophize by problems outside philosophy. … Genuine philosophical problems are always rooted outside philosophy and they die if these roots decay. … [emphasis mine] These roots are easily forgotten by philosophers who “study” philosophy instead of being forced into philosophy by the pressure of nonphilosophical problems.

Such thinking may explain Popper’s success outside philosophy, particularly with scientists, traders, and decision makers, as well as his relative failure inside of it. (He is rarely studied by his fellow philosophers; they prefer to write essays on Wittgenstein.)

Also note that I do not want to be drawn into philosophical debates with my Black Swan idea. What I mean by Platonicity is not so metaphysical. Plenty of people have argued with me about whether I am against “essentialism” (i.e., things that I hold don’t have a Platonic essence), if I believe that mathematics would work in an alternative universe, or some such thing. Let me set the record straight. I am a no-nonsense practitioner; I am not saying that mathematics does not correspond to an objective structure of reality; my entire point is that we are, epistemologically speaking, putting the cart before the horse and, of the space of possible mathematics, risk using the wrong one and being blinded by it. I truly believe that there are some mathematics that work, but that these are not as easily within our reach as it seems to the “confirmators”.

The Bishop and the Analyst

I am most often irritated by those who attack the bishop but somehow fall for the securities analyst – those who exercise their skepticism against religion but not against economists, social scientists, and phony statisticians. Using the confirmation bias, these people will tell you that religion was horrible for mankind by counting deaths from the Inquisition and various religious wars. But they will not show you how many people were killed by nationalism, social science, and political theory under Stalinism or during the Vietnam War. Even priests don’t go to bishops when they feel ill: their first stop is the doctor’s. But we stop by the offices of many pseudo-scientists and “experts” without alternative. We no longer believe in papal infallibility; we seem to believe in the infallibility of the Nobel, though, as we saw in Chapter 17.

Easier Than You Think: The Problem of Decision Under Skepticism

I have said all along that there is a problem with induction and the Black Swan. In fact, matters are far worse: we may have no less of a problem with phony skepticism.


a. I can’t do anything to stop the sun from nonrising tomorrow (no matter how hard I try),

b. I can’t do anything about whether or not there is an afterlife,

c. I can’t do anything about Martians or demons taking hold of my brain.


But I have plenty of ways to avoid being a sucker. It is not much more difficult than that.


I conclude Part Three by reiterating that my antidote to Black Swans is precisely to be noncommoditized in my thinking. But beyond avoiding being a sucker, this attitude lends itself to a protocol of how to act – not how to think, but how to convert knowledge into action and figure out what knowledge is worth. Let us examine what to do or not do with this in the concluding section of this book.

Загрузка...