5. LANGUAGE

One morning I shot an elephant in my pajamas. How he got into my pajamas I’ll never know.

— GROUCHO MARX

SHE SELLS SEASHELLS by the seashore. A pleasant peasant pheasant plucker plucks a pleasant pheasant. These are words that twist the tongue.

Human language may seem majestic, from the perspective of a vervet monkey confined to a vocabulary of three words (roughly, eagle, snake, and leopard). But in reality, language is filled with foibles, imperfections, and idiosyncrasies, from the way we pronounce words to the ways we put together sentences. We start, we stop, we stutter, we use like as a punctuation marker; we swap our consonants like the good Reverend William Archibald Spooner (1844-1930), who turned Shakespeare’s one fell swoop into one swell foop. (A real smart feller becomes a real… well, you get the idea.) We may say bridge of the neck when we really mean bridge of the nose; we may mishear All of the members of the group grew up in Philadelphia as All of the members of the group threw up in Philadelphia. Mistakes like these[27] are a tic of the human mind.

The challenge, for the cognitive scientist, is to figure out which idiosyncrasies are really important. Most are mere trivia, amusing but not reflective of the deep structures of the mind. The word driveway, for example, used to refer to driving on a private road that went from a main road to a house. In truth, we still drive on (or at least into) driveways, but we scarcely notice the driving part, since the drive is short; the word’s meaning shifted when real estate boomed and our ideas of landscaping changed. (The park in parkways had nothing to do with parking, but rather with roads that ran along or through parks, woodsy green places that have given way to suburbs and the automobile.) Yet facts like these reveal nothing deep about the mind because other languages are free to do things more systematically, so that cars would park, for example, in a Parkplatz.

Likewise it is amusing but not deeply significant to note that we “relieve” ourselves in water closets and bathrooms, even though our W.C.’s are bigger than closets and our bathrooms have no baths. (For that matter, public restrooms may be public, and may be rooms, but I’ve never seen anyone rest in one.) But our reluctance to say where we plan to go when we “have to go” isn’t really a flaw in language; it’s just a circumlocution, a way of talking around the details in order to be polite.

Some of the most interesting linguistic quirks, however, run deeper, reflecting not just the historical accidents of particular languages, but fundamental truths about those creatures that produce language — namely, us.

Consider, for instance, the fact that all languages are rife with ambiguity, not just the sort we use deliberately (“I can’t recommend this person enough”) or that foreigners produce by accident (like the hotel that advised its patrons to “take advantage of the chambermaid”), but the sort that ordinary people produce quite by accident, sometimes with disastrous consequences. One such case occurred in 1982, when a pilot’s ambiguous reply to a question about his position (“at takeoff”) led to a plane crash that killed 583 people; the pilot in question said “Ready for takeoff,” but air traffic control interpreted this as meaning “in the process of taking off.”

To be perfect, a language would presumably have to be unambiguous (except perhaps where deliberately intended to be ambiguous), systematic (rather than idiosyncratic), stable (so that, say, grandparents could communicate with their grandchildren), nonredundant (so as not to waste time or energy), and capable of expressing any and all of our thoughts.[28] Every instance of a given speech sound would invariably be pronounced in a constant way, each sentence as clean as a mathematical formula. In the words of one of the leading philosophers of the twentieth century, Bertrand Russell,

in a logically perfect language, there will be one word and no more for every simple object, and everything that is not simple will be expressed by a combination of words, by a combination derived, of course, from the words for the simple things that enter in, one word for each simple component. A language of that sort will be completely analytic, and will show at a glance the logical structure of the facts asserted or denied.

Every human language falls short of this sort of perfection. Russell was probably wrong in his first point — it’s actually quite handy (logical, even) for a language to allow for the household pet to be referred to as Fido, a dog, a poodle, a mammal, and an animal — but right in thinking that in an ideal language, words would be systematically related in meaning and in sound. But this is distinctly not the case. The words jaguar, panther, ocelot, and puma, for example, sound totally different, yet all refer to felines, while hardly any of the words that sound like cat — cattle, catapult, catastrophe — have any connection to cats.

Meanwhile, in some cases language seems redundant (couch and sofa mean just about the same thing), and in others, incomplete (for example, no language can truly do justice to the subtleties of what we can smell). Other thoughts that seem perfectly coherent can be surprisingly difficult to express; the sentence Whom do you think that John left? (where the answer is, say, Mary, his first wife) is grammatical, but the ostensibly similar Whom do you think that left Mary? (where the answer would be John) is not. (A number of linguists have tried to explain this phenomenon, but it’s hard to understand why this asymmetry should exist at all; there’s no real analogy in mathematics or computer languages.)

Ambiguity, meanwhile, seems to be the rule rather than the exception. A run can mean anything from a jog to a tear in a stocking to scoring a point in baseball, a hit anything from a smack to a best-selling tune. When I say “I’ll give you a ring tomorrow,” am I promising a gift of jewelry or just a phone call? Even little words can be ambiguous; as Bill Clinton famously said, “It all depends on what the meaning of the word ‘is’ is.” Meanwhile, even when the individual words are clear, sentences as a whole may not be: does Put the hook on the towel on the table mean that there is a book on the towel that ought to be on the table or that a book, which ought to be on a towel, is already on the table?

Even in languages like Latin, which might — for all its cases and word endings — seem more systematic, ambiguities still crop up. For instance, because the subject of a verb can be left out, the third-person singular verb Amat can stand on its own as a complete sentence — but it might mean “He loves,” “She loves,” or “It loves.” As the fourth-century philosopher Augustine, author of one of the first essays on the topic of ambiguity, put it, in an essay written in the allegedly precise language of Latin, the “perplexity of ambiguity grows like wild flowers into infinity.”

And language falls short on our other criteria too. Take redundancy. From the perspective of maximizing communication relative to effort, it would make little sense to repeat ourselves. Yet English is full of redundancies. We have “pleonasms” like null and void, cease and desist, and for all intents and purposes, and pointless redundancies like advance planning. And then there’s the third-person singular suffix -s, which we use only when we can already tell from the subject that we have a third-person singular. The -s in he buys, relative to they buy, gives you no more information than if we just dropped the -s altogether and relied on the subject alone. The sentence These three dogs are retrievers conveys the notion of plurality not once but five times — in pluralizing the demonstrative pronoun {these as opposed to this), in the numeral {three), in the plural noun (dogs versus dog), in the verb (are versus is), and a final time in the final noun (retrievers versus retriever). In languages like Italian or Latin, which routinely omit subjects, a third-person plural marker makes sense; in English, which requires subjects, the third-person plural marker often adds nothing. Meanwhile, the phrase Johns picture, which uses the possessive -’s, is ambiguous in at least three ways. Does it refer to a picture John took of someone else (say, his sister)? A photo that someone else (say, his sister) took of him? Or a picture of something else altogether (say, a blue-footed booby, Sula nebouxii), taken by someone else (perhaps a photographer from National Geographic), which John merely happens to own?

And then there’s vagueness. In the sentence It’s warm outside, there’s no clear boundary between what counts as warm and what counts as not warm. Is it 70 degrees? 69? 68? 67? I can keep dropping degrees, but where do we draw the line? Or consider a word like heap. How many stones does it take to form a heap? Philosophers like to amuse themselves with the following mind-twister, known as a sorites (rhymes with pieties) paradox:

Clearly, one stone does not make a heap. If one stone is not enough to qualify as a heap of stones, nor should two, since adding one stone to a pile that is not a heap should not turn that pile into a heap. And if two stones don’t make a heap, three stones shouldn’t either — by a logic that seemingly ought to extend to infinity. Working in the opposite direction, a man with 10,000 hairs surely isn’t bald. But just as surely, plucking one hair from a man who is not bald should not produce a transition from notbald to bald. So if a man with 9,999 hairs cannot be judged to be bald, the same should apply to a man with 9,998. Following the logic to its extreme, hair by hair, we are ultimately unable even to call a man with zero hairs “bald.”

If the boundary conditions of words were more precise, such reasoning (presumably fallacious) might not be so tempting.

Adding to the complication is the undeniable fact that languages just can’t help but change over time. Sanskrit begat Hindi and Urdu; Latin begat French, Italian, Spanish, and Catalan. West Germanic begat Dutch, German, Yiddish, and Frisian. English, mixing its Anglo-Saxon monosyllables (Halt!) with its Greco-Latin impress-your-friends polysyllables (Abrogate all locomotion!), is the stepchild of French and West Germanic, a little bit country, a little bit rock-and-roll.

Even where institutions like l’Académie française try to legislate language, it remains unruly. L’Académie has tried to bar from French such English-derived as le hamburger, le drugstore, le week-end, le strip-tease, le pull-over, le tee-shirt, le chewing gum, and la cover-girl — with no success whatsoever. With the rapid development of popular new techonology — such as iPods, podcasts, cell phones, and DVDs — the world needs new words every day.[29]

Most of us rarely notice the instability or vagueness of language, even when our words and sentences aren’t precise, because we can decipher language by supplementing what grammar tells us with our knowledge of the world. But the fact that we can rely on something other than language — such as shared background knowledge — is no defense. When I “know what you mean” even though you haven’t said it, language itself has fallen short. And when languages in general show evidence of these same problems, they reflect not only cultural history but also the inner workings of the creatures who learn and use them.

Some of these facts about human language have been recognized for at least two millennia. Plato, for example, worried in his dialogue Cratylus that “the fine fashionable language of modern times has twisted and disguised and entirely altered the original meaning” of words. Wishing for a little more systematicity, he also suggested that “words should as far as possible resemble things… if we could always, or almost always, use likenesses, which are perfectly appropriate, this would be the most perfect state of language.”

From the time of twelfth-century mystic Hildegard of Bingen, if not earlier, some particularly brave people have tried to do something about the problem and attempted to build more sensible languages from scratch. One of the most valiant efforts was made by English mathematician John Wilkins (1614-1672), who addressed Plato’s concern about the systematicity of words. Why, for example, should cats, tigers, lions, leopards, jaguars, and panthers each be named differently, despite their obvious resemblance? In his 1668 work An Essay Towards a Real Character and a Philosophical Language, Wilkins sought to create a systematic “non-arbitrary” lexicon, reasoning that words ought to reflect the relations among things. In the process, he made a table of 40 major concepts, ranging from quantities, such as magnitude, space, and measure, to qualities, such as habit and sickness, and then he divided and subdivided each concept to a fine degree. The word de referred to the elements (earth, air, fire, and water), the word deb referred to fire, the first (in Wilkins’s scheme) of the elements, deba to a part of fire, namely a flame, deba to a spark, and so forth, such that every word was carefully (and predictably) structured.

Most languages don’t bother with this sort of order, incorporating new words catch-as-catch-can. As a consequence, when we English speakers see a rare word, say, ocelot, we have nowhere to start in determining its meaning. Is it a cat? A bird? A small ocean? Unless we speak Nahuatl (a family of native North Mexican languages that includes Aztec), from which the word is derived, we have no clue. Where Wilkins promised systematicity, we have only etymology, the history of a word’s origin. An ocelot, as it happens, is a wild feline that gets its name from North Mexico; going further south, pumas are felines from Peru. The word jaguar comes from the Tupi language of Brazil. Meanwhile, the words leopard, tiger, and panther appear in ancient Greek. From the perspective of a child, each word is a fresh learning challenge. Even for adults, words that come up rarely are difficult to remember.

Among all the attempts at a perfect language, only one has really achieved any traction — Esperanto, created by one Ludovic Lazarus Zamenhof, born on December 15,1859. Like Noam Chomsky, the father of modern linguistics, Zamenhof was son of a Hebrew scholar. By the time he was a teenager, little Ludovic had picked up French, German, Polish, Russian, Hebrew, Yiddish, Latin, and Greek. Driven by his love for language and a belief that a universal language could alleviate many a social ill, Zamenhof aimed to create one that could quickly and easily be acquired by any human being.

Saluton! Cu vi parolas Esperanton? Mia nomo estas Gary.

[Hello. Do you speak Esperanto? My name is Gary.]

Despite Zamenhof’s best efforts, Esperanto is used today by only a few million speakers (with varying degrees of expertise), one tenth of 1 percent of the world’s population. What makes one language more prevalent than another is mostly a matter of politics, money, and influence. French, once the most commonly spoken language in the West, wasn’t displaced by English because English is better, but because Britain and the United States became more powerful and more influential than France. As the Yiddish scholar Max Weinrich put it, “A shprakh iz a diyalekt mit an armey un a flot” — “The only difference between a language and a dialect is an army and a navy.”

With no nation-state invested in the success of Esperanto, it’s perhaps not surprising that it has yet to displace English (or French, Spanish, German, Chinese, Japanese, Hindi, or Arabic, to name a few) as the most prevalent language in the world. But it is instructive nonetheless to compare it to human languages that emerged naturally. In some ways, Esperanto is a dream come true. For example, whereas German has a half-dozen different ways to form the plural, Esperanto has only one. Any language student would sigh with relief.

Still, Esperanto gets into some fresh troubles of its own. Because of its strict rules about stress (the penultimate syllable, always), there is no way to distinguish whether the word senteme is made up of sent + em + e (“feeling” + “a tendency toward” + adverbial ending) or sen + tern + e (“without” + “topic” + adverbial ending). Thus the sentence La profesoro senteme parolis dum du horoj could mean either “The professor spoke with feeling for two hours” or (horrors!) “The professor rambled on for two hours.” The sentence Estis batata la demono de la viro is triply ambiguous; it can mean “The demon was beaten by the man,” “The demon was beaten out of the man,” or “The man’s demon was beaten.” Obviously, banishing irregularity is one thing, banishing ambiguity another.

Computer languages don’t suffer from these problems; in Pascal, C, Fortran, or LISP, one finds neither rampant irregularity nor pervasive ambiguity — proof in principle that languages don’t have to be ambiguous. In a well-constructed program, no computer ever wavers about what it should do next. By the very design of the languages in which they are written, computer programs are never at a loss.

Yet no matter how clear computer languages may be, nobody speaks C, Pascal, or LISP. Java may be the computer world’s current lingua franca, but I surely wouldn’t use it to talk about the weather. Software engineers depend on special word processors that indent, colorize, and keep track of their words and parentheses, precisely because the structure of computer languages seems so unnatural to the human mind.

To my knowledge, only one person ever seriously tried to construct an ambiguity-free, mathematically perfect human language, mathematically perfect not just in vocabulary but also in sentence construction. In the late 1950s a linguist by the name of James Cooke Brown constructed a language known as Loglan, short for “logical language.” In addition to a Wilkins-esque systematic vocabulary, it includes 112 “little words” that govern logic and structure. Many of these little words have English equivalents {tui, “in general”; tue, “moreover”; tai, “above all”), but the really crucial words correspond to things like parentheses (which most spoken languages lack) and technical tools for picking out specific individuals mentioned earlier in the discourse. The English word he, for example, would be translated as da if it refers to the first singular antecedent in a discourse, de if it refers to the second, di if it refers to the third, do if it refers the fourth, and du if it refers to the fifth. Unnatural as this might seem, this system would banish considerable confusion about the antecedents of pronouns. (American Sign Language uses physical space to represent something similar, making signs in different places, depending on which entity is being referred to.) To see why this is useful, consider the English sentence He runs and he walks. It might describe a single person who runs and walks, or two different people, one running, the other walking; by contrast, in Loglan, the former would be rendered unambiguously as Da prano i da dzoru, the latter unambiguously as Da prano i de dzoru.

But Loglan has made even fewer inroads than Esperanto. Despite its “scientific” origins, it has no native speakers. On the Loglan website, Brown reports that at “The Loglan Institute… live-in apprentices learned the language directly from me (and I from them!), I am happy to report that sustained daily Loglan-only conversations lasting three-quarters of an hour or more were achieved,” but so far as I know, nobody has gotten much further. For all its ambiguity and idiosyncrasy, English goes down much smoother for the human mind. We couldn’t learn a perfect language if we tried.

As we have seen already, idiosyncrasy often arises in evolution when function and history clash, when good design is at odds with the raw materials already close at hand. The human spine, the panda’s thumb (formed from a wrist bone) — these are ramshackle solutions that owe more to evolutionary inertia than to any principle of good design. So it is with language too.

In the hodgepodge that is language, at least three major sources of idiosyncrasy arise from three separate clashes: (1) the contrast between the way our ancestors made sounds and the way we would ideally like to make them, (2) the way in which our words build on a primate understanding of the world, and (3) a flawed system of memory that works in a pinch but makes little sense for language. Any one of these alone would have been enough to leave language short of perfection. Together, they make language the collective kluge that it is: wonderful, loose, and flexible, yet manifestly rough around the edges.

Consider first the very sounds of language. It’s probably no accident that language evolved primarily as a medium of sound, rather than, say, vision or smell. Sound travels over reasonably long distances, and it allows one to communicate in the dark, even with others one can’t see. Although much the same might be said for smell, we can modulate sound much more rapidly and precisely, faster than even the most sophisticated skunk can modulate odor. Speech is also faster than communicating by way of physical motion; it can flow at about twice the speed of sign language.

Still, if I were building a system for vocal communication from scratch, I’d start with an iPod: a digital system that could play back any sound equally well. Nature, in contrast, started with a breathing tube. Turning that breathing tube into a means of vocal production was no small feat. Breathing produces air, but sound is modulated air, vibrations produced at just the right sets of frequencies. The Rube Goldberg-like vocal system consists of three fundamental parts: respiration, phonation, and articulation.

Respiration is just what it sounds like. You breathe in, your chest expands; your chest compresses, and a stream of air comes out. That stream of air is then rapidly divided by the vocal folds into smaller puffs of air (phonation), about 80 times a second for a baritone like James Earl Jones, as much as 500 times per second for a small child. From there, this more-or-less constant sound source is filtered, so that only a subset of its many frequencies makes it through. For those who like visual analogies, imagine producing a perfect white light and then applying a filter, so that only part of the spectrum shines through. The vocal tract works on a similar “source and filter” principle. The lips, the tip of the tongue, the tongue body, the velum (also known as the soft palate), and the glottis (the opening between the vocal folds) are known collectively as articulators. By varying their motions, these articulators shape the raw sound stream into what we know as speech: you vibrate your vocal cords when you say “bah” but not “pah”; you close your lips when say “mah” but move your tongue to your teeth when you say “nah.”

Respiration, phonation, and articulation are not unique to humans. Since fish walked the land, virtually all vertebrates, from frogs to birds to mammals, have used vocally produced sound to communicate. Human evolution, however, depended on two key enhancements: the lowering of our larynx (not unique to humans but very rare elsewhere in the animal kingdom) and increased control of the ensemble of articulators that shape the sound of speech. Both have consequences.

Consider first the larynx. In most species, the larynx consists of a single long tube. At some point in evolution, our larynx dropped down. Moreover, as we changed posture and stood upright, it took a 90-degree turn, dividing into two tubes of more or less equal length, which endowed us with considerably more control of our vocalizations — and radically increased our risk of choking. As first noted by Darwin, “Every particle of food and drink which we swallow has to pass over the orifice of the trachea, with some risk of falling into the lungs” — something we’re all vulnerable to.[30]

Maybe you think the mildly increased risk of choking is a small price to pay, maybe you don’t. It certainly didn’t have to be that way; breathing and talking could have relied on different systems. Instead, our propensity for choking is one more clear sign that evolution tinkered with what was already in place. The result is a breathing tube that does double duty as a vocal tract — in occasionally fatal fashion.

In any event, the descended larynx was only half the battle. The real entrée into speech came from significantly increased control over our articulators. But here too the system is a bit of a kluge. For one thing, the vocal tract lacks the elegance of the iPod, which can play back more or less any sound equally well, from Moby’s guitars and flutes to hip-hop’s car crashes and gunshots. The vocal tract, in contrast, is tuned only to words. All the world’s languages are drawn from an inventory of 90 sounds, and any particular language employs no more than half that number — an absurdly tiny subset when you think of the many distinct sounds the ear can recognize.

Imagine, for example, a human language that would refer to something by reproducing the sound it makes. I’d refer to my favorite canine, Ari, by reproducing his woof, not by calling him a dog. But the three-part contraption of respiration, phonation, and articulation can only do so much; even where languages allegedly refer to objects by their sounds — the phenomenon known as onomatopoeia — the “sounds” we refer to sound like, well, words. Woof is a perfectly well formed English word, a cross between, say, wool and hoof but not a faithful reproduction of Ari’s vocalization (nor that of any other dog). And the comparable words in other languages each sound different, none exactly like a woof or a bark. French dogs go ouah, ouah, Albanian dogs go ham, ham, Greek dogs go gav, gov, Korean dogs go mung, mung, Italian dogs go bau, ban, German dogs wau, wau: each language creates the sound in its own way. Why? Because our vocal tract is a clumsy contraption that is good for making the sounds of speech — and little else.

Tongue-twisters emerge as a consequence of the complicated dance that the articulators perform. It’s not enough to close our mouth or move our tongue in a basic set of movements; we have to coordinate each one in precisely timed ways. Two words can be made up of exactly the same physical motions performed in a slightly different sequence. Mad and ban, for example, each require the same four crucial movements — the velum (soft palate) widens, the tongue tip moves toward alveolar closure, the tongue body widens in the pharynx, and the lips close — but one of those gestures is produced early in one word (mad) and late in another (ban). Problems occur as speech speeds up — it gets harder and harder to get the timing right. Instead of building a separate timer (a clock) for each gesture, nature forces one timer into double (or triple, or quadruple) duty.

And that timer, which evolved long before language, is really good at only very simple rhythms: keeping things either exactly in phase (clapping) or exactly out of phase (alternating steps in walking, alternating strokes in swimming, and so forth). All that is fine for walking or running, but not if you need to perform an action with a more complex rhythm. Try, for example, to tap your right hand at twice the rate of your left. If you start out slow, this should be easy. But now gradually increase the tempo. Sooner or later you will find that the rhythm of your tapping will break down (the technical term is devolve) from a ratio of 2:1 to a ratio of 1:1.

Which returns us to tongue-twisters. Saying the words she sells properly involves a challenging coordination of movements, very much akin to tapping at the 2:1 ratio. If you first say the words she and sells aloud, slowly and separately, you’ll realize that the /s/ and /sh/ sounds have something in common — a tongue-tip movement — but only /sh/ also includes a tongue-body gesture. Saying she sells properly thus requires coordinating two tongue-tip gestures with one tongue-body gesture. When you say the words slowly, everything is okay, but say them fast, and you’ll stress the internal clock. The ratio eventually devolves to 1:1, and you wind up sticking in a tongue-body gesture for every tongue-tip gesture, rather than every other one. Voilà, she sells has become she shells. What “twists” your tongue, in short, is not a muscle but a limitation in an ancestral timing mechanism.

The peculiar nature of our articulatory system and how it evolved, leads to one more consequence: the relation between sound waves and phonemes (the smallest distinct speech sounds, such as /s/ and /à/) is far more complicated than it needs to be. Just as our pronunciation of a given sequence of letters depends on its linguistic context (think of how you say ough when reading the title of Dr. Seuss’s book The Tough Coughs As He Ploughs the Dough), the way in which we produce a particular linguistic element depends on the sounds that come before it and after it. For example, the sound Isl is pronounced in one way in the word see (with spread lips) but in another in the word sue (with rounded lips). This makes learning to talk a lot more work than it might otherwise be. (It’s also part of what makes computerized voice-recognition a difficult problem.)

Why such a complex system? Here again, evolution is to blame; once it locked us into producing sounds by articulatory choreography, the only way to keep up the speed of communication was to cut corners. Rather than produce every phoneme as a separate, distinct element (as a simple computer modem would), our speech system starts preparing sound number two while it’s still working on sound number one. Thus, before I start uttering the h in happy, my tongue is already scrambling into position in anticipation of the a. When I’m working on a, my lips are already getting ready for the pp, and when I’m on pp, I’m moving my tongue in preparation for the y

This dance keeps the speed up, but it requires a lot of practice and can complicate the interpretation of the message.[31] What’s good for muscle control isn’t necessarily good for a listener. If you should mishear John Fogerty’s “There’s a bad moon on the rise” as “There’s a bathroom on the right,” so be it. From the perspective of evolution, the speech system, which works most of the time, is good enough, and that’s all that matters.

Curmudgeons of every generation think that their children and grandchildren don’t speak properly. Ogden Nash put it this way in 1962, in “Laments for a Dying Language”:

Coin brassy words at will, debase the coinage;

We’re in an if-you-cannot-lick-them-join age,

A slovenliness provides its own excuse age,

Where usage overnight condones misusage.

Farewell, farewell to my beloved language,

Once English, now a vile orangutanguage.

Words in computer languages are fixed in meaning, but words in human languages change constantly; one generation’s bad means “bad,” and the next generation’s bad means “good.” Why is it that languages can change so quickly over time?

Part of the answer stems from how our prelinguistic ancestors evolved to think about the world: not as philosophers or mathematicians, brimming with precision, but as animals perpetually in a hurry, frequently settling for solutions that are “good enough” rather than definitive.

Take, for example, what might happen if you were walking through the Redwood Forest and saw a tree trunk; odds are, you would conclude that you were looking at a tree, even if that trunk happened to be so tall that you couldn’t make out any leaves above. This habit of making snap judgments based on incomplete evidence (no leaves, no roots, just a trunk, and still we conclude we’ve seen a tree) is something we might call a logic of “partial matching.”

The logical antithesis, of course, would be to wait until we’d seen the whole thing; call that a logic of “full matching.” As you can imagine, he who waits until he’s seen the whole tree would never be wrong, but also risks missing a lot of bona fide foliage. Evolution rewarded those who were swift to decide, not those who were too persnickety to act.

For better or worse, language inherited this system wholesale. You might think of a chair, for instance, as something with four legs, a back, and a horizontal surface for sitting. But as the philosopher Ludwig Wittgenstein (1889-1951) realized, few concepts are really defined with such precision. Beanbag chairs, for example, are still considered chairs, even though they have neither an articulated back nor any sort of legs.

I call my cup of water a glass even though it’s made of plastic; I call my boss the chair of my department even though so far as I can tell she merely sits in one. A linguist or phylogenist uses the word tree to refer to a diagram on a page simply because it has branching structures, not because it grows, reproduces, or photosynthesizes. A head is the topside of a penny, the tail the bottom, even though the top has no more than a picture of a head, the bottom not a fiber of a wagging tail. Even the slightest fiber of connection suffices, precisely because words are governed by an inherited, ancestral logic of partial matches.[32]

Another idiosyncrasy of language, considerably more subtle, has to do with words like some, every, and most, known to linguists as “quantifiers” because they quantify, answering questions like “How much?” and “How many?”: some water, every boy, most ideas, several movies.

The peculiar thing is that in addition to quantifiers, we have another whole system that does something similar. This second system traffics in what linguists call “generics,” somewhat vague, generally accurate statements, such as Dogs have four legs or Paperbacks are cheaper than hardcovers. A perfect language might stick only to the first system, using explicit quantifiers rather than generics. An explicitly quantified sentence such as Every dog has four legs makes a nice, strong, clear statement, promising no exceptions. We know how to figure out whether it is true. Either all the dogs in the world have four legs, in which case the sentence is true, or at least one dog lacks four legs, in which case the sentence is false — end of story. Even a quantifier like some is fairly clear in its application; some has to mean more than one, and (pragmatically) ought not to mean every.

Generics are a whole different ball game, in many ways much less precise than quantifiers. It’s just not clear how many dogs have to have four legs before the statement Dogs have four legs can be considered true, and how many dogs would have to exhibit three legs before we’d decide that the statement is false. As for Paperbacks are cheaper than hardcovers, most of us would accept the sentence as true as a general rule of thumb, even if we knew that lots of individual paperbacks (say, imports) are more expensive than many individual hardcovers (such as discounted bestsellers printed in large quantities). We agree with the statement Mosquitoes carry the West Nile virus, even if only (say) 1 percent of mosquitoes carry the virus, yet we wouldn’t accept the statement Dogs have spots even if all the dalmatians in the world did.

Computer-programming languages admit no such imprecision; they have ways of representing formal quantifiers (

[DO THIS THING REPEATEDLY UNTIL EVERY DATABASE RECORD HAS BEEN EXAMINED]
) but no way of expressing generics at all. Human languages are idiosyncratic — and verging on redundant — inasmuch as they routinely exploit both systems, generics and the more formal quantifiers.

Why do we have both systems? Sarah-Jane Leslie, a young Princeton philosopher, has suggested one possible answer. The split between generics and quantifiers may reflect the divide in our reasoning capacity, between a sort of fast, automatic system on the one hand and a more formal, deliberative system on the other. Formal quantifiers rely on our deliberative system (which, when we are being careful, allows us to reason logically), while generics draw on our ancestral reflexive system. Generics are, she argues, essentially a linguistic realization of our older, less formal cognitive systems. Intriguingly, our sense of generics is “loose” in a second way: we are prepared to accept as true generics like Sharks attack bathers or Pit bulls maul children even though the circumstances they describe are statistically very rare, provided that they are vivid or salient — just the kind of response we might expect from our automatic, less deliberative system.

Leslie further suggests that generics seem to be learned first in childhood, before formal quantifiers; moreover, they may have emerged earlier in the development of language. At least one contemporary language (Piraha, spoken in the Amazon Basin) appears to employ generics but not formal quantifiers. All of this suggests one more way in which the particular details of human languages depend on the idiosyncrasies of how our mind evolved.

For all that, I doubt many linguists would be convinced that language is truly a kluge. Words are one thing, sentences another; even if words are clumsy, what linguists really want to know about is syntax, the glue that binds words together. Could it be that words are a mess, but grammar is different, a “near-perfect” or “optimal” system for connecting sound and meaning?

In the past several years, Noam Chomsky, the founder and leader of modern linguistics, has taken to arguing just that. In particular, Chomsky has wondered aloud whether language (by which he means mainly the syntax of sentences) might come close “to what some super-engineer would construct, given the conditions that the language faculty must satisfy.” As linguists like Tom Wasow and Shalom Lappin have pointed out, there is considerable ambiguity in Chomsky’s suggestion. What would it mean for a language to be perfect or optimal? That one could express anything one might wish to say? That language is the most efficient possible means for obtaining what one wants? Or that language was the most logical system for communication anyone could possibly imagine? It’s hard to see how language, as it now stands, can lay claim to such grand credentials. The ambiguity of language, for example, seems unnecessary (as computers have shown), and language works in ways neither logical nor efficient (just think of how much extra effort is often required in order to clarify what our words mean). If language were a perfect vehicle for communication, infinitely efficient and expressive, I don’t think we would so often need “paralinguistic” information, like that provided by gestures, to get our meaning across.

As it turns out, Chomsky actually has something different in mind. He certainly doesn’t think language is a perfect tool for communication; to the contrary, he has argued that it is a mistake to think of language as having evolved “for” the purposes of communication at all. Rather, when Chomsky says that language is nearly optimal, he seems to mean that its formal structure is surprisingly elegant, in the same sense that string theory is. Just as string theorists conjecture that the complexity of physics can be captured by a small set of basic laws, Chomsky has, since the early 1990s, been trying to capture what he sees as the superficial complexity of language with a small set of laws.[33] Building on that idea, Chomsky and his collaborators have gone so far as to suggest that language might be a kind of “optimal solution… [to] the problem of linking the sensory-motor and conceptual-intentional systems” (or, roughly, connecting sound and meaning). They suggest that language, despite its obvious complexity, might have required only a single evolutionary advance beyond our inheritance from ancestral primates, namely, the introduction of a device known as “recursion.”

Recursion is a way of building larger structures out of smaller structures. Like mathematics, language is a potentially infinite system. Just as you can always make a number bigger by adding one (a trillion plus one, a googleplex plus one, and so forth), you can always make a sentence longer by adding a new clause. My favorite example comes from Maxwell Smart on the old Mel Brooks TV show Get Smart: “Would you believe that I know that you know that I know that you know where the bomb is hidden?” Each additional clause requires another round of recursion.

There’s no doubt that recursion — or something like it — is central to human language. The fact that we can put together one small bit of structure (the man) with another (who went up the hill) to form a more complex bit of structure (the man who went up the hill) allows us to create arbitrarily complex sentences with terrific precision (The man with the gun is the man who went up the hill, not the man who drove the getaway car). Chomsky and his colleagues even have suggested that recursion might be “the only uniquely human component of the faculty of language.”

A number of scholars have been highly critical of that radical idea. Steven Pinker and the linguist Ray Jackendoff have argued that recursion might actually be found in other aspects of the mind (such as the process by which we recognize complex objects as being composed of recognizable subparts). The primatologist David Premack, meanwhile, has suggested that although recursion is a hallmark of human language, it is scarcely the only thing separating human language from other forms of communication. As Premack has noted, it’s not as if chimpanzees can speak an otherwise humanlike language that lacks recursion (which might consist of language minus complexities such as embedded clauses).[34] I’d like to go even further, though, and take what we’ve learned about the nature of evolution and humans to turn the whole argument on its head.

The sticking point is what linguists call syntactic trees, diagrams like this:

<…>

Small elements can be combined to form larger elements, which in turn can be combined into still larger elements. There’s no problem in principle with building such things — computers use trees, for example, in representing the directory, or “folder” structures, on a hard drive.

But, as we have seen time and again, what is natural for computers isn’t always natural for the human brain: building a tree would require a precision in memory that humans just don’t appear to have. Building a tree structure with postal-code memory is trivial, something that the world’s computer programmers do many times a day. But building a tree structure out of contextual memory is a totally different story, a kluge that kind of works and kind of doesn’t.

Working with simple sentences, we’re usually fine, but our capacity to understand sentences can easily be compromised. Take, for example, this short sentence I mentioned in the opening chapter:

People people left left.

Here’s a slightly easier variant:

Farmers monkeys fear slept.

Four words each, but enough to boggle most people’s mind. Yet both sentences are perfectly grammatical. The first means that some set of people who were abandoned by a second group of people themselves departed; the second one means, roughly, “There is a set of farmers that the monkeys fear, and that set of farmers slept; the farmers that the monkeys were afraid of slept.” These kinds of sentences — known in the trade as “center embeddings” (because they bury one clause directly in the middle of another) — are difficult, I submit, precisely because evolution never stumbled on proper tree structure.[35]

Here’s the thing: in order to interpret sentences like these and fully represent recursion (another classic is The rat the cat the mouse chased bit died), we would need to keep track of each noun and each verb, and at the same time hold in mind the connections between them and the clauses they form. Which is just what grammatical trees are supposed to do.

The trouble is, to do that would require an exact memory for the structures and words that have just been said (or read). And that’s something our postal-code-free memories just aren’t set up to do. If I were to read this book aloud and suddenly, without notice, stop and ask you to repeat the last sentence you heard — you probably couldn’t. You’d likely remember the gist of what I had said, but the exact wording would almost surely elude you.[36]

As a result, efforts to keep track of the structure of sentences becomes a bit like efforts to reconstruct the chronology of a long-ago sequence of events: clumsy, unreliable, but better than nothing. Consider, for example, a sentence like It was the banker that praised the barber that alienated his wife that climbed the mountain. Now, quick: was the mountain climbed by the banker, the barber, or his wife? A computer-based parser would have no trouble answering this question; each noun and each verb would be slotted into its proper place in a tree. But many human listeners end up confused. Lacking any hint of memory organized by location, the best we can do is approximate trees, clumsily kluging them together out of contextual memory. If we receive enough distinctive clues, it’s not a problem, but when the individual components of sentences are similar enough to confuse, the whole edifice comes tumbling down.

Perhaps the biggest problem with grammar is not the trouble we have in constructing trees, but the trouble we have in producing sentences that are certain to be parsed as we intend them to be. Since our sentences are clear to us, we assume they are clear to our listeners. But often they’re not; as engineers discovered when they started trying to build machines to understand language, a significant fraction of what we say is quietly ambiguous.[37]

Take, for example, this seemingly benign sentence: Put the block in the box on the table. An ordinary sentence, but it can actually mean two things: a request to put a particular block that happens to be in a box onto the table, or a request to take some block and put it into a particular box that happens to be on the table. Add another clause, and we wind up with four possibilities:

Put the block [(in the box on the table) in the kitchen].

Put the block [in the box (on the table in the kitchen)].

Put [the block (in the box) on the table] in the kitchen.

Put (the block in the box) (on the table in the kitchen).

Most of the time, our brain shields us from the complexity, automatically doing its best to reason its way through the possibilities. If we hear Put the block in the box on the table, and there’s just one block, we don’t even notice the fact that the sentence could have meant something else. Language alone doesn’t tell us that, but we are clever enough to connect what we hear with what it might mean. (Speakers also use a range of “paralinguistic” techniques, like pointing and gesturing, to supplement language; they can also look to their listeners to see if they appear to understand.)

But such tricks can take us only so far. When we are stuck with inadequate clues, communication becomes harder, one reason that emails and phone calls are more prone to misunderstandings than face-to-face communication is. And even when we speak directly to an audience, if we use ambiguous sentences, people may just not notice; they may think they’ve understood even when they haven’t really. One eye-opening study recently asked college students to read aloud a series of grammatically ambiguous[38] sentences like Angela shot the man with the gun (in which the gun might have been either Angela’s murder weapon or a firearm the victim happened to be carrying). They were warned in advance that the sentences were ambiguous and permitted to use as much stress (emphasis) on individual words as they liked; the question was whether they could tell when they successfully put their meaning across. It turns out that most speakers were lousy at the task and had no clue about how bad they were. In almost half the cases in which subjects thought that they had successfully conveyed a given sentence’s meaning, they were actually misunderstood by their listeners! (The listeners weren’t much better, frequently assuming they’d understood when they hadn’t.)

Indeed, a certain part of the work that professional writers must do (especially if they write nonfiction) is compensate for language’s limitations, to scan carefully to make sure that there’s no vague he that could refer to either the farmer or his son, no misplaced commas, no dangling (or squinting) modifiers, and so forth. In Robert Louis Stevenson’s words, “The difficulty of literature is not to write, but to write what you mean.” Of course, sometimes ambiguity is deliberate, but that’s a separate story; it’s one thing to leave a reader with a vivid sense of a difficult decision, another to accidentally leave a reader confused.

Put together all these factors — inadvertent ambiguity, idiosyncratic memory, snap judgments, arbitrary associations, and a choreography that strains our internal clocks — and what emerges? Vagueness, idiosyncrasy, and a language that is frequently vulnerable to misinterpretation — not to mention a vocal apparatus more byzantine than a bagpipe made up entirely of pipe cleaners and cardboard dowels. In the words of the linguist Geoff Pullum, “The English language is, in so many ways, a flawed masterpiece of evolution, loaded with rough bits, silly design oversights, ragged edges, stupid gaps, and malign and perverted irregularities.”

As the psycholinguist Fernanda Ferreira has put it, language is “good enough,” not perfect. Most of the time we get things right, but sometimes we are easily confused. Or even misled. Few people, for example, scarcely notice that something’s amiss when you ask them, “How many animals did Moses bring onto the ark?”[39] Even fewer realize that a sentence like More people have been to Russia than I have is either (depending on your point of view) ungrammatical or incoherent.

If language were designed by an intelligent engineer, interpreters would be out of a job, and Berlitz’s language schools would be drive thrus, no lifetime commitment required. Words would be systematically related to one another, and phonemes consistently pronounced. You could tell all those voice-activated telephone menu systems exactly where you wanted them to go — and be assured they’d understand the message. There would be no ambiguity, no senseless irregularity. People would say what they mean and mean what they say. But instead, we have slippage. Our thoughts get stuck on the tip of the tongue when we can’t recall a specific word. Grammar ties us in knots (is it The keys to the cabinet are… or The keys to the cabinet is… Oh never mind…). Syntax on the fly is hard.

This is not to say that language is terrible, only that it could, with forethought, have been even better.

The rampant confusion that characterizes language is not, however, without its logic: the logic of evolution. We co-articulate, producing speech sounds differently, depending on the context, because we produce sound not by running a string of bits through a digital amplifier to electromagnetically driven speakers but by thrashing our tongues around three-dimensional cavities that originated as channels for digestion, not communication. Then, as She sells seashells by the seashore, our tortured tongues totally trip. Why? Because language was built, rapidly, on a haphazard patchwork of mechanisms that originally evolved for other purposes.

Загрузка...