The real purpose of the scientific method is to make sure nature hasn’t misled you into thinking you know something you actually don’t know.
Robert Pirsig, Zen and the Art of Motorcycle Maintenance
Why do we have statistics, why do we measure things, and why do we count? If the scientific method has any authority – or as I prefer to think of it, ‘value’ – it is because it represents a systematic approach; but this is valuable only because the alternatives can be misleading. When we reason informally – call it intuition, if you like – we use rules of thumb which simplify problems for the sake of efficiency. Many of these shortcuts have been well characterised in a field called ‘heuristics’, and they are efficient ways of knowing in many circumstances.
This convenience comes at a cost – false beliefs – because there are systematic vulnerabilities in these truth-checking strategies which can be exploited. This is not dissimilar to the way that paintings can exploit shortcuts in our perceptual system: as objects become more distant, they appear smaller, and ‘perspective’ can trick us into seeing three dimensions where there are only two, by taking advantage of this strategy used by our depth-checking apparatus. When our cognitive system – our truth-checking apparatus – is fooled, then, much like seeing depth in a flat painting, we come to erroneous conclusions about abstract things. We might misidentify normal fluctuations as meaningful patterns, for example, or ascribe causality where in fact there is none.
These are cognitive illusions, a parallel to optical illusions. They can be just as mind-boggling, and they cut to the core of why we do science, rather than basing our beliefs on intuition informed by a ‘gist’ of a subject acquired through popular media: because the world does not provide you with neatly tabulated data on interventions and outcomes. Instead it gives you random, piecemeal data in dribs and drabs over time, and trying to construct a broad understanding of the world from a memory of your own experiences would be like looking at the ceiling of the Sistine Chapel through a long, thin cardboard tube: you can try to remember the individual portions you’ve spotted here and there, but without a system and a model, you’re never going to appreciate the whole picture.
Let’s begin.
Randomness
As human beings, we have an innate ability to make something out of nothing. We see shapes in the clouds, and a man in the moon; gamblers are convinced that they have ‘runs of luck’; we take a perfectly cheerful heavy-metal record, play it backwards, and hear hidden messages about Satan. Our ability to spot patterns is what allows us to make sense of the world; but sometimes, in our eagerness, we are oversensitive, trigger-happy, and mistakenly spot patterns where none exist.
In science, if you want to study a phenomenon, it is sometimes useful to reduce it to its simplest and most controlled form. There is a prevalent belief among sporting types that sportsmen, like gamblers (except more plausibly), have ‘runs of luck’. People ascribe this to confidence, ‘getting your eye in’, ‘warming up’, or more, and while it might exist in some games, statisticians have looked in various places where people have claimed it to exist and found no relationship between, say, hitting a home run in one inning, then hitting a home run in the next.
Because the ‘winning streak’ is such a prevalent belief, it is an excellent model for looking at how we perceive random sequences of events. This was used by an American social psychologist called Thomas Gilovich in a classic experiment. He took basketball fans and showed them a random sequence of X’s and O’s, explaining that they represented a player’s hits and misses, and then asked them if they thought the sequences demonstrated ‘streak shooting’.
Here is a random sequence of figures from that experiment. You might think of it as being generated by a series of coin tosses.
OXXXOXXXOXXOOOXOOXXOO
The subjects in the experiment were convinced that this sequence exemplified ‘streak shooting’ or ‘runs of luck’, and it’s easy to see why, if you look again: six of the first eight shots were hits. No, wait: eight of the first eleven shots were hits. No way is that random …
What this ingenious experiment shows is how bad we are at correctly identifying random sequences. We are wrong about what they should look like: we expect too much alternation, so truly random sequences seem somehow too lumpy and ordered. Our intuitions about the most basic observation of all – distinguishing a pattern from mere random background noise – are deeply flawed.
This is our first lesson in the importance of using statistics instead of intuition. It’s also an excellent demonstration of how strong the parallels are between these cognitive illusions and the perceptual illusions with which we are more familiar. You can stare at a visual illusion all you like, talk or think about it, but it will still look ‘wrong’. Similarly, you can look at that random sequence above as hard as you like: it will still look lumpy and ordered, in defiance of what you now know.
Regression to the mean
We have already looked at regression to the mean in our section on homeopathy: it is the phenomenon whereby, when things are at their extremes, they are likely to settle back down to the middle, or ‘regress to the mean’.
We saw this with reference to the Sports Illustrated jinx (and Bruce Forsyth’s Play Your Cards Right), but also applied it to the matter in hand, the question of people getting better: we discussed how people will do something when their back pain is at its worst – visit a homeopath, perhaps – and how although it was going to get better anyway (because when things are at their worst they generally do), they ascribe their improvement to the treatment.
There are two discrete things happening when we fall prey to this failure of intuition. Firstly, we have failed to correctly spot the pattern of regression to the mean. Secondly, crucially, we have then decided that something must have caused this illusory pattern: specifically, a homeopathic remedy, for example. Simple regression is confused with causation, and this is perhaps quite natural for animals like humans, whose success in the world depends on our being able to spot causal relationships rapidly and intuitively: we are inherently oversensitive to them.
To an extent, when we discussed the subject earlier I relied on your good will, and on the likelihood that from your own experience you could agree that this explanation made sense. But it has been demonstrated in another ingeniously pared-down experiment, where all the variables were controlled, but people still saw a pattern, and causality, where there was none.
The subjects in the experiment played the role of a teacher trying to make a child arrive punctually at school for 8.30 a.m. They sat at a computer on which it appeared that each day, for fifteen consecutive days, the supposed child would arrive some time between 8.20 and 8.40; but unbeknownst to the subjects, the arrival times were entirely random, and predetermined before the experiment began. Nonetheless, the subjects were all allowed to use punishments for lateness, and rewards for punctuality, in whatever permutation they wished. When they were asked at the end to rate their strategy, 70 per cent concluded that reprimand was more effective than reward in producing punctuality from the child.
These subjects were convinced that their intervention had an effect on the punctuality of the child, despite the child’s arrival time being entirely random, and exemplifying nothing more than ‘regression to the mean’. By the same token, when homeopathy has been shown to elicit no more improvement than placebo, people are still convinced that it has a beneficial effect on their health.
To recap:
We see patterns where there is only random noise.
We see causal relationships where there are none.
These are two very good reasons to measure things formally. It’s bad news for intuition already. Can it get much worse?
The bias towards positive evidence
It is the peculiar and perpetual error of the human understanding to be more moved and excited by affirmatives than negatives.
Francis Bacon
It gets worse. It seems we have an innate tendency to seek out and overvalue evidence that confirms a given hypothesis. To try to remove this phenomenon from the controversial arena of CAM – or the MMR scare, which is where this is headed – we are lucky to have more pared-down experiments which illustrate the general point.
Imagine a table with four cards on it, marked ‘A’, ‘B’,‘2’and ‘3’. Each card has a letter on one side, and a number on the other. Your task is to determine whether all cards with a vowel on one side have an even number on the other. Which two cards would you turn over? Everybody chooses the ‘A’ card, obviously, but like many people – unless you really forced yourself to think hard about it – you would probably choose to turn over the ‘2’ card as well. That’s because these are the cards which would produce information consistent with the hypothesis you are supposed to be testing. But in fact, the cards you need to flip are the ‘A’ and the ‘3’, because finding a vowel on the back of the ‘2’ would tell you nothing about ‘all cards’, it would just confirm ‘some cards’, whereas finding a vowel on the back of ‘3’ would comprehensively disprove your hypothesis. This modest brainteaser demonstrates our tendency, in our unchecked intuitive reasoning style, to seek out information that confirms a hypothesis: and it demonstrates the phenomenon in a value-neutral situation.
This same bias in seeking out confirmatory information has been demonstrated in more sophisticated social psychology experiments. When trying to determine if someone is an ‘extrovert’, for example, many subjects will ask questions for which a positive answer would confirm the hypothesis (‘Do you like going to parties?’) rather than refute it.
We show a similar bias when we interrogate information from our own memory. In one experiment, subjects read a vignette about a woman who exemplified various introverted and extroverted behaviours, and were then divided into two groups. One group was asked to consider her suitability for a job as a librarian, while the other was asked to consider her suitability for a job as an estate agent. Both groups were asked to come up with examples of both her extroversion and her introversion. The group considering her for the librarian job recalled more examples of introverted behaviour, while the group considering her for a job selling real estate cited more examples of extroverted behaviour.
This tendency is dangerous, because if you only ask questions that confirm your hypothesis, you will be more likely to elicit information that confirms it, giving a spurious sense of confirmation. It also means – thinking more broadly – that the people who pose the questions already have a head start in popular discourse.
So we can add to our running list of cognitive illusions, biases and failings of intuition:
We overvalue confirmatory information for any given hypothesis.
We seek out confirmatory information for any given hypothesis.
Biased by our prior beliefs
[I] followed a golden rule, whenever a new observation or thought came across me, which was opposed to my general results, to make a memorandum of it without fail and at once; for I had found by experience that such facts and thoughts were far more apt to escape from the memory than favourable ones.
Charles Darwin
This is the reasoning flaw that everybody does know about, and even if it’s the least interesting cognitive illusion – because it’s an obvious one – it has been demonstrated in experiments which are so close to the bone that you may find them, as I do, quite unnerving.
The classic demonstration of people being biased by their prior beliefs comes from a study looking at beliefs about the death penalty. A large number of proponents and opponents of state executions were collected. They were all shown two pieces of evidence on the deterrent effect of capital punishment: one supporting a deterrent effect, the other providing evidence against it.
The evidence they were shown was as follows:
A comparison of murder rates in one US state before the death penalty was brought in, and after.
A comparison of murder rates in different states, some with, and some without, the death penalty.
But there was a very clever twist. The proponents and opponents of capital punishment were each further divided into two smaller groups. So, overall, half of the proponents and opponents of capital punishment had their opinion reinforced by before/after data, but challenged by state/state data, and vice versa.
Asked about the evidence, the subjects confidently uncovered flaws in the methods of the research that went against their pre-existing view, but downplayed the flaws in the research that supported their view. Half the proponents of capital punishment, for example, picked holes in the idea of state/state comparison data, on methodological grounds, because that was the data that went against their view, while they were happy with the before/after data; but the other half of the proponents of capital punishment rubbished the before/after data, because in their case they had been exposed to before/after data which challenged their view, and state/state data which supported it.
Put simply, the subjects’ faith in research data was not predicated on an objective appraisal of the research methodology, but on whether the results validated their pre-existing views. This phenomenon reaches its pinnacle in alternative therapists – or scaremongers – who unquestioningly champion anecdotal data, whilst meticulously examining every large, carefully conducted study on the same subject for any small chink that would permit them to dismiss it entirely.
This, once again, is why it is so important that we have clear strategies available to us to appraise evidence, regardless of its conclusions, and this is the major strength of science. In a systematic review of the scientific literature, investigators will sometimes mark the quality of the ‘methods’ section of a study blindly – that is, without looking at the ‘results’ section – so that it cannot bias their appraisal. Similarly, in medical research there is a hierarchy of evidence: a well performed trial is more significant than survey data in most contexts, and so on.
So we can add to our list of new insights about the flaws in intuition:
Our assessment of the quality of new evidence is biased by our previous beliefs.
Availability
We spend our lives spotting patterns, and picking out the exceptional and interesting things. You don’t waste cognitive effort, every time you walk into your house, noticing and analysing all the many features in the visually dense environment of your kitchen. You do notice the broken window and the missing telly.
When information is made more ‘available’, as psychologists call it, it becomes disproportionately prominent. There are a number of ways this can happen, and you can pick up a picture of them from a few famous psychology experiments into the phenomenon.
In one, subjects were read a list of male and female names, in equal number, and then asked at the end whether there were more men or women in the list: when the men in the list had names like Ronald Reagan, but the women were unheard of, people tended to answer that there were more men than women; and vice versa.
Our attention is always drawn to the exceptional and the interesting, and if you have something to sell, it makes sense to guide people’s attention to the features you most want them to notice. When fruit machines pay up, they make a theatrical ‘kerchunk-kerchunk’ sound with every coin they spit out, so that everybody in the pub can hear it; but when you lose, they don’t draw attention to themselves. Lottery companies, similarly, do their absolute best to get their winners prominently into the media; but it goes without saying that you, as a lottery loser, have never had your outcome paraded for the TV cameras.
The anecdotal success stories about CAM – and the tragic anecdotes about the MMR vaccine – are disproportionately misleading, not just because the statistical context is missing, but because of their ‘high availability’: they are dramatic, associated with strong emotion, and amenable to strong visual imagery. They are concrete and memorable, rather than abstract. No matter what you do with statistics about risk or recovery, your numbers will always have inherently low psychological availability, unlike miracle cures, scare stories, and distressed parents.
It’s because of ‘availability’, and our vulnerability to drama, that people are more afraid of sharks at the beach, or of fairground rides on the pier, than they are of flying to Florida, or driving to the coast. This phenomenon is even demonstrated in patterns of smoking cessation amongst doctors: you’d imagine, since they are rational actors, that all doctors would simultaneously have seen sense and stopped smoking once they’d read the studies showing the phenomenally compelling relationship between cigarettes and lung cancer. These are men of applied science, after all, who are able, every day, to translate cold statistics into meaningful information and beating human hearts.
But in fact, from the start, doctors working in specialities like chest medicine and oncology – where they witnessed patients dying of lung cancer with their own eyes – were proportionately more likely to give up cigarettes than their colleagues in other specialities. Being shielded from the emotional immediacy and drama of consequences matters.
Social influences
Last in our whistle-stop tour of irrationality comes our most self-evident flaw. It feels almost too obvious to mention, but our values are socially reinforced by conformity and by the company we keep. We are selectively exposed to information that revalidates our beliefs, partly because we expose ourselves to situations where those beliefs are apparently confirmed; partly because we ask questions that will – by their very nature, for the reasons described above – give validating answers; and partly because we selectively expose ourselves to people who validate our beliefs.
It’s easy to forget the phenomenal impact of conformity. You doubtless think of yourself as a fairly independent-minded person, and you know what you think. I would suggest that the same beliefs were held by the subjects of Asch’s experiments into social conformity. These subjects were placed near one end of a line of actors who presented themselves as fellow experimental subjects, but were actually in cahoots with the experimenters. Cards were held up with one line marked on them, and then another card was held up with three lines of different lengths: six inches, eight inches, ten inches.
Everyone called out in turn which line on the second card was the same length as the line on the first. For six of the eighteen pairs of cards the accomplices gave the correct answer; but for the other twelve they called out the wrong answer. In all but a quarter of the cases, the experimental subjects went along with the incorrect answer from the crowd of accomplices on one or more occasions, defying the clear evidence of their own senses.
That’s an extreme example of conformity, but the phenomenon is all around us. ‘Communal reinforcement’ is the process by which a claim becomes a strong belief, through repeated assertion by members of a community. The process is independent of whether the claim has been properly researched, or is supported by empirical data significant enough to warrant belief by reasonable people.
Communal reinforcement goes a long way towards explaining how religious beliefs can be passed on in communities from generation to generation. It also explains how testimonials within communities of therapists, psychologists, celebrities, theologians, politicians, talk-show hosts, and so on, can supplant and become more powerful than scientific evidence.
When people learn no tools of judgement and merely follow their hopes, the seeds of political manipulation are sown.
Stephen Jay Gould
There are many other well-researched areas of bias. We have a disproportionately high opinion of ourselves, which is nice. A large majority of the public think they are more fair-minded, less prejudiced, more intelligent and more skilled at driving than the average person, when of course only half of us can be better than the median. Most of us exhibit something called ‘attributional bias’: we believe our successes are due to our own internal faculties, and our failures are due to external factors; whereas for others, we believe their successes are due to luck, and their failures to their own flaws. We can’t all be right.
Lastly, we use context and expectation to bias our appreciation of a situation – because, in fact, that’s the only way we can think. Artificial intelligence research has drawn a blank so far largely because of something called the ‘frame problem’: you can tell a computer how to process information, and give it all the information in the world, but as soon as you give it a real-world problem – a sentence to understand and respond to, for example – computers perform much worse than we might expect, because they don’t know what information is relevant to the problem. This is something humans are very good at – filtering irrelevant information – but that skill comes at a cost of ascribing disproportionate bias to some contextual data.
We tend to assume, for example, that positive characteristics cluster: people who are attractive must also be good; people who seem kind might also be intelligent and well-informed. Even this has been demonstrated experimentally: identical essays in neat handwriting score higher than messy ones; and the behaviour of sporting teams which wear black is rated as more aggressive and unfair than teams which wear white.
And no matter how hard you try, sometimes things just are very counterintuitive, especially in science. Imagine there are twenty-three people in a room. What is the chance that two of them celebrate their birthday on the same date? One in two.
When it comes to thinking about the world around you, you have a range of tools available. Intuitions are valuable for all kinds of things, especially in the social domain: deciding if your girlfriend is cheating on you, perhaps, or whether a business partner is trustworthy. But for mathematical issues, or assessing causal relationships, intuitions are often completely wrong, because they rely on shortcuts which have arisen as handy ways to solve complex cognitive problems rapidly, but at a cost of inaccuracies, misfires and oversensitivity.
It’s not safe to let our intuitions and prejudices run unchecked and unexamined: it’s in our interest to challenge these flaws in intuitive reasoning wherever we can, and the methods of science and statistics grew up specifically in opposition to these flaws. Their thoughtful application is our best weapon against these pitfalls, and the challenge, perhaps, is to work out which tools to use where. Because trying to be ‘scientific’ about your relationship with your partner is as stupid as following your intuitions about causality.
Now let’s see how journalists deal with stats.
I’d be genuinely intrigued to know how long it takes to find someone who can tell you the difference between ‘median’, ‘mean’ and ‘mode’, from where you are sitting right now.
If it helps to make this feel a bit more plausible, bear in mind that you only need any two dates to coincide. With forty-seven people, the probability increases to 0.95: that’s nineteen times out of twenty! (Fifty-seven people and it’s 0.99; seventy people and it’s 0.999.) This is beyond your intuition: at first glance, it makes no sense at all.