Chapter 3
Soylent Music
Blind Joggers
Joggers love their headphones. If you ask them why, they’ll tell you music keeps them motivated. The right song can transform what is by all rights an arduous half hour of ascetic masochism into an exhilarating whirlwind (or, in my case, into what feels like only 25 minutes of ascetic masochism). Music-driven joggers may be experiencing a pleasurable diversion, but to the other joggers and bikers in their vicinity, they’re Tasmanian Devils. In choosing to jog to the beat of someone else’s drum rather than their own, headphone-wearing joggers have “blinded” themselves to the sounds of the other movers around them. Headphones don’t prevent joggers from deftly navigating the trees, stumps, curbs, and parked cars of the world, because these things can be seen as one approaches them. But when you’re moving in a world with other movers, things not currently in front of you can quickly arrive in front of you. That’s when the headphoned jogger stumbles . . . and crashes into the crossing jogger, passing biker, or first-time tricycler.
These music-blinded movers may be a menace to our streets, but they can serve to educate us all about one of our underappreciated powers: using sound alone, we know where people are around us, and we know the nature of their movement. I’m sitting in a coffee shop as I write this, and when I close my eyes, I can sense the movement all around me: a clop of boots just passed to my right; a person with jingling keys just walked in front of me from my right to my left, and back again; and the pitter-patter of a child just meandered way out in front of me. I sense where they are, their direction of motion, and their speed. I also sense their gait, such as whether they are walking or running. And I can often tell more than this: I can distinguish a brisk from a shuffling walk, an angry stomp from a happy prance; and I can even recognize a complex behavior like turning and stopping to drop a dirty tray in a bin, slowing to open a door, or reversing direction to fetch a forgotten coffee. My auditory system carries out these mover-detection computations even when I’m not consciously attending to them. That’s why I’m difficult to sneak up on (although they keep trying!), and why I only rarely find myself saying, “How long has that cheerleading squad been doing jumping jacks behind me?!” That almost never happens to me because my auditory system is keeping track of where people are and roughly what they’re doing, even when I’m otherwise occupied.
We can now see why joggers with ears unencumbered by headphones almost never crash into feral dogs or runaway grandpas in wheelchairs: they may not see the dog or grandpa, but they hear their movement through space, and can dynamically modulate their running to avoid both and be merrily on their way. Without headphones, joggers are highly sensitive to the sounds of cars, and can track their movement: that car is coming around the bend; the one over there is reversing directly toward me; the one above me is falling; and so on. Joggers in headphones, on the other hand, have turned off their movement-detection systems, and should be passed with caution! And although they are a hazard to pedestrians and cyclists, the people they put at greatest risk are themselves. After a collision between a jogger and an automobile, the automobile typically only needs a power wash to the grille.
How does your auditory system serve as a movement-tracking system? In addition to sensing whether a mover is to your left or right, in front or behind, and above or below—a skill that depends on the shape, position, and number of ears you have—you possess specialized auditory software that interprets the sounds of movers and generates a good guess as to the nature of the mover’s movement through space. Your software has evolved to give you four kinds of information about a mover: (i) his distance from you, (ii) his directedness toward (or away from, or at an angle to) you, (iii) his speed, and (iv) his behavior or gait. How, then, does your auditory system infer these four kinds of information? As we will see in this and the following chapters, (i) distance is gleaned from loudness, (ii) directedness toward you is cued by pitch, (iii) speed is inferred by the number of footsteps per second, and (iv) behavior and gait are read from the pattern and emphasis of footsteps. Four fundamental parameters of human movement, and four kinds of auditory cues: (i) loudness, (ii) sound frequency, (iii) step rate, and (iv) temporal pattern and emphasis. (See Figure 13.) Your auditory system has evolved to track these cues because of the supreme value of knowing what everyone is doing nearby, and where.
This is where things get interesting. Even though joggers without headphones are not listening to music, their auditory systems are listening to fundamentally music-like constituents. Consider the four auditory movement cues mentioned just above (and shown on the right of Figure 13). Loudness? That’s just pianissimo versus piano versus forte and so on. (This is called “dynamics” in music, a term I will avoid because it brings confusion in the context of a movement theory of music.) Sound frequency? That’s roughly pitch. Step rate? That’s tempo. And the gait pattern? That’s akin to rhythm and beat. The four fundamental auditory cues for movement are, then, mighty similar to (i) loudness, (ii) pitch, (iii) tempo, and (iv) rhythm. (See Figure 14.) These are the most fundamental ingredients of music, and yet, there they are in the sounds of human movers. The most informative sounds of human movers are the fundamental building blocks of music!
Figure 13. The four properties of human movers (left) are inferred from the four respective auditory stimuli (right).
Figure 14. Central to music are the four musical properties in the center column, which map directly onto the auditory cues for sensing human movement.
The importance of loudness, pitch, tempo, and rhythm to both music and movement is, as we will see, more than a coincidence. The similarity runs deep—something speculated on ever since the Greeks[1]. Music is not just built with the building blocks of movement, but is actually organized like movement, thereby harnessing our movement-recognition auditory mechanisms. Headphoned joggers, then, don’t just miss out on the real movement around them—they pipe fictional movement into their ears, making them even more hazardous than a jogger wearing earplugs.
Much of the rest of this book is about how music came into the lives of us humans, how it gets into our brains, and why it affects us as it does. In short, we will see that music moves us because it literally sounds like moving.
The Secret Ingredient
When I was a teenager, my mother began listening to French instructional programs in order to brush up. She was proud of me when I began sitting and listening with her. “Perhaps my son isn’t a square physics kid after all,” she thought. And, in fact, I found the experience utterly enthralling. After many months, however, my mother’s pride turned to worry, because whenever she attempted to banter in even the most elementary French with me, I would stare back, dumbfounded. “Why isn’t this kid learning French?” she fretted.
What I didn’t tell my mother was that I wasn’t trying to learn French. Why was I bothering to listen to a program I could not comprehend? I will let you in on my secret in a moment, but in the meantime I can tell you what I was not listening to it for: the speech sounds. No one would set aside a half hour each day for months in order to listen to unintelligible speech. Foreign speech sounds can pique our curiosity, but we don’t go out of our way to hear them. If people loved foreign speech sounds, there would be a market for them; we would set our alarm clocks to blare German at 5:30 a.m., listen to Navajo on the way to work in the car, and put on Bushmen clicks as background for our dinner parties. No. I was not listening to the French program for the speech sounds. Speech doesn’t enthrall us—not even in French.
Whereas foreign speech sounds don’t make it as a form of entertainment, music is quintessentially entertaining. Music does get piped into our alarm clocks, car radios, and dinner parties. Music has its own vibrant industry, whereas no one is foolish enough to see a business opportunity in easy-listening foreign speech sounds. And this motivates the following question. Why is music so evocative? Why doesn’t music feel like listening to speech sounds, or animal calls, or garbage disposal rumbles? Put simply: why is music nice to listen to?
In an effort to answer, let’s go back to the French instructional program and my proud, and then concerned, mother. Why was I joining my mom each day for a lesson I couldn’t comprehend, and had no intention of comprehending? Truth be told, it wasn’t an audiotape we were listening to, but a television show. And it wasn’t the meaningless-to-me speech sounds that lured me in, but one of the actors. A young French actress, in particular. Her hair, her smile, her mannerisms, her pout . . . but I digress. I wasn’t watching for the French language so much as for the French people, one in particular. Sorry, Mom!
What was evocative about the show and kept me wanting more was the human element. The most important thing in the lives of our ancestors was the other people around them, and it is on the faces and bodies of other people that we find the most emotionally evocative stimuli. So when one finds a human artifact that is capable of evoking strong feelings, my hunch is that it looks or sounds human in some way. This is, I suggest, an important clue to the nature of music.
Let’s take a step back from speech and music, and look for a moment at evocative and nonevocative visual stimuli in order to see whether evocativeness springs from people. In particular, consider two kinds of visual stimuli, writing and color—each an area of my research covered in my previous book, The Vision Revolution.
Writing, I have argued, has culturally evolved over centuries to look like natural objects, and to have the contour structures found in three-dimensional scenes of opaque objects. The nature that underlies writing is, then, “opaque objects in 3-D,” and that is not a specifically human thing. Writing looks like objects, not humans, and thus only has the evocative power expected of opaque objects: little or none. That’s why most writing—like the letters and words on this page—is not emotionally evocative to look at. (See top left of Figure 15.) Colors, on the other hand, are notoriously evocative—people have strong preferences regarding the colors of their clothes, cars, and houses, and we sense strong associations between color and emotions. I have argued in my research and in The Vision Revolution that color vision in us primates—our new-to-primates red-green sensitivity in particular—evolved to detect the blood physiology modulations occurring in the skin, which allow us to see color signals indicating emotional state and mood. Color vision in us primates is primarily about the emotions of others. Color is about humans, and it is this human connection to color that is the source of color’s evocativeness. And although, unlike color, writing is not generally evocative, not all writing is sterile. For example, “V” stimuli have long been recognized as one of the most evocative geometrical shapes for warning symbols. But notice that “V” stimuli are reminiscent of (exaggerations of) “angry eyebrows” on angry faces. Color is “about” human skin and emotion, and “V” stimuli may be about angry eyebrows—so the emotionality in each one springs from a human source. (See top right of Figure 15.) We see, then, that the nonevocative visual signs look like opaque, not-necessarily-human objects, and the evocative visual signs look like human expressions. I have summarized this in the top row of the table in Figure 15.
Figure 15. Evocative stimuli (right column) are usually made with people, whereas nonevocative stimuli (left column) are more physics-related and sterile.
Do we find that evocativeness springs from the same human source within the auditory domain? Let’s start with speech. As we discussed in the previous chapter, speech sounds like solid-object physical events. “Solid-object physical events” amount to a sterile physics category of sound, akin in nerdiness to “three-dimensional world of opaque objects.” We are capable of mimicking lots of nonhuman sounds, and speech, then, amounts to yet another mimicry of this kind. Ironically, human speech does not sound human at all. It is consequently not evocative. (See the bottom left square of Figure 15 for speech’s place in the table.) Which brings us back to music, the other major kind of auditory stimulus people produce besides speech. Just as color is evocative but writing is not, music is evocative but speech sounds are not. This suggests that, just as color gets its emotionality from people, perhaps music gets its emotionality from people. Could it be that music, like Soylent Green, is made out of people? (Music has been placed at the bottom right of the table in Figure 15.)
If we believe that music sounds like people, then we greatly reduce the range of worldly sounds music may be mimicking. That amounts to progress: music is probably mostly not about birdsong, wind, water, math, and so on. But, unfortunately, humans make a wide variety of sounds, some in fundamentally different categories, such as speech, coughs, sneezes, laughter, heartbeats, chewing, walking, hammering, and so on. We’ll need a more specific theory than one that simply says music is made from people. Next, though, we ask why there isn’t any purely visual domain that is as exciting to us as music.
Going Solo
If the visual system and auditory system had competitive streaks, they might argue about which modality has the most compelling art. Each would be allowed to cite as examples only cases exclusively within its own modality: vision-only versus audition-only. This is a difficult contest to officiate. Should vision be allowed to cite all the features of visual design found in culture, such as clothes, cars, buildings, and everyday objects? If so, it would have a big leg up on audition, which is not nearly so involved in the design of our physical artifacts. Let’s agree not to include these, by virtue of an “official rule” that the art must be purchased by people for the purpose merely of enjoying the aesthetics, with no other functional benefit. That is, is it vision or audition that commands the greatest portion of the market for art and entertainment?
If you set it up in this way, audition trounces vision. Although the visual modality is found in huge markets like television, video games, and movies, these rely on audition as well. People put visual art on their walls, but that typically amounts to just a few purchases, whereas it is common to find people who own thousands of music albums. The market for the purely visual arts is miniscule compared to that for audition. This is counterintuitive, because if you ask most of us to name the most beautiful things we know of, we are likely to respond with a list of visuals. But when we vote with our pocketbooks, audition wins the solo artist contest. Why is that?
One possible explanation is simply that it is easier to carry on with the chores of life while music is in the background, whereas the visual arts inherently get in the way. Try driving or working or throwing a dinner party while admiring the Mona Lisa. But I suspect it is more than this. If it was merely because of the difficulty of enjoying visual arts while having a life, one might expect us to want to stare at beautiful visual art all day, if only we had nothing else pressing to do. Most of us, however, don’t exactly fancy the idea of watching visual images all day (without sound). Listening to music all day, however, sounds quite charming! And, in fact, many of us do spend our days listening to music.
The stark inequality of vision and audition in this competition for “best solo performer” in the arts is due to a fundamental ecological asymmetry. When we see things in the world, those things are typically making noise. Seeing without hearing therefore feels strange, unnatural, or as if it is missing something. But hearing without seeing is commonplace, because we hear all sorts of things we cannot see—when our eyes are closed, when the source is behind us, when the source is occluded, or when the environment is dark. Sights nearly always come with sounds, but sounds very commonly come without sights. And that’s why audition is happy to be a solo artist, but vision isn’t. Music is the single-modality artist extraordinaire.
While we now have some idea why there’s no solely visual art that rivals music, we still have barely begun our quest to understand why music is so compelling that we are willing to purchase thousands of albums.
At the Heart of a Theory of Music
If music sounds human in fundamental respects, as our discussion in the section before last suggested, then it seems to have made heroic efforts to obfuscate this fact. I readily admit that music doesn’t sound human to me—not consciously, at least. But recall the section titled “Below the Radar” from Chapter 1, where I said that we don’t necessarily expect cultural artifacts to mimic nature “all the way up.” It may be the case that much of our lower-level auditory apparatus thinks that music sounds like humans, but that because of certain high-level dissimilarities, we—our conscious selves—don’t notice it. How, then, can I hope to convince anyone? I have to convince you, after all, not your lower-level auditory areas!
What we need are some qualifying hurdles that a theory of music should have to leap over to gain a hearing . . . hurdles that, once cleared, will serve to persuade some of Earth’s teeming music buffs that music does indeed sound like people moving. Toward this end, here are four such hurdles—questions that any aspiring theory of music might hope to answer.
Brain: Why do we have a brain for music?
Emotion: Why is music emotionally evocative?
Dance: Why do we dance?
Structure: Why is music organized the way it is?
If a theory can answer all four questions, then I believe we should start paying attention.
To help clarify what I mean by these questions, let’s run through them in the context of a particular lay theory of music: the “heartbeat” theory. Although there is probably more than just one heartbeat theory held by laypeople, the main theme appears to be that a heart has a beat, as music does. Of course, we don’t typically hear our own heartbeat, much less others’, so when the theory is fleshed out, it is often suggested that the fundamental beat was laid down when we were in utero. One of the constants of the good fetal life was Momma’s heartbeat, and music takes us back to those oceanic, one-with-the-universe feelings we long ago lost. I’m not suggesting that this is a good theory, by any means, but it will aid me in illustrating the four hurdles. I would be hesitant, by the way, to call this “lub-dub” theory of music crazy—our understanding of the origins of music is so woeful that any nonspooky theory is worth a look. Let’s see how lub-dubs fare with our four hurdles for a theory of music.
The first hurdle was this: “Why do we have a brain for music?” That is, why are our brains capable of processing music? For example, fax machines are designed to process the auditory modulations occurring in fax machine communication, but to our ears fax machines sound like a fairly continuous screech-brrr—we don’t have brains capable of processing fax machine sounds. Music may well sound homogeneously screechy-brrrey to nonhuman ears, but it sounds richly dynamic and structured to our ears. How might the lub-dub theorist explain why we have a brain for music? Best I can figure, the lub-dubber could say that our in-utero days of warmth and comfort get strongly associated to Momma’s heartbeat, and the musical beat taps into those associations, bringing back warm fetus feelings. One difficulty for this hypothesis is that learned associations often don’t last forever, so why would those Momma’s-heartbeat associations be so strong among adults? There are lots of beatlike stimuli outside of the womb: some are nice, some are not nice. Why wouldn’t those out-of-the-womb sounds become the dominant associations, with Momma’s heartbeat washed away? And if Momma’s lub-dubs are, for some reason, not washed away, then why aren’t there other in utero experiences that forever stay with us? Why don’t we, say, like to wear artificial umbilical cords, thereby evoking recollections of the womb? And why, at any rate, do we think we were so happy in the womb? Maybe those days, supposing they leave any trace at all, are associated with nothing whatsoever. (Or perhaps with horror.) The lub-dub theory of music does not have a plausible story for why we have a brain ready and eager to soak up a beat.
The lub-dub theory of music origins also comes up short in the second major demand on a theory of music: that it explain why music is evocative, or emotional. This was the subject of the previous section. Heartbeats are made by people, but heartbeat sounds amount to a one-dimensional parameter—faster or slower rate—and are not sufficiently rich to capture much of the range of human emotion. Accordingly, heartbeats won’t help much in explaining the range of emotions music can elicit in listeners. Psychophysiologists who look for physiological correlates of emotion take a variety of measurements (e.g., heart rate, blood pressure, skin conductance), not just one. Heart sounds aren’t rich enough to tug at all music’s heartstrings.
Heartbeats also fail the “dance” hurdle. The “dance” requirement is that we explain why it is that music should elicit dance. This fundamental fact about music is a strange thing for sounds to do. In fact, it is a strange thing for any stimulus to do, in any modality. For lub-dubs, the difficulty for the dance hurdle is that even if lub-dubs were fondly recalled by us, and even if they managed to elicit a wide range of emotions, we would have no idea why they should provoke post-uterine people to move, given that even fetuses don’t move to Momma’s heartbeat.
The final requirement of a theory of music is that it must explain the structure of music, a tall order. Lub-dubs do have a beat, of course, but heartbeats are far too simple to begin to explain the many other structural regularities found in music. For starters, where is the melody?
Sorry, Mom (again). Thanks for the good times in your uterus, but I’m afraid your heartbeats are not the source of my fascination with music.
Although the lub-dub theory fails the four requirements for a theory of music, the music-sounds-like-human-movement theory of music, as we will see, has answers to all four. We have a brain for music because possessing auditory mechanisms for recognizing what people are doing around us is clearly advantageous. Music is evocative because it sounds like human behaviors, many of which are expressive in their nature—something we will discuss further in a few pages. Music gets us dancing because, as we will also discuss, we social apes are prone to mimic the movements of others. And, finally, the movement theory is sufficiently powerful that it can explain a lot of the structure of music—that will require the upcoming chapter and the Encore (at the end of the book) to describe.
Underlying Overtones
The heartbeat theory suffered cardiac arrest, but it was never intended as a serious contender. It was just a prop for illustrating the four hurdles. Speech, on the other hand, is a much more plausible starting point as a foundation for music. But haven’t we already discussed speech? Wasn’t that what the previous chapter was about? We concluded then that speech sounds like solid-object physical events: the structural regularities found among solid-object events are reflected in the phonological patterns of human speech. Speech is all about the phonemes, and how closely they mimic nature’s pattern of hits, slides, and ring sounds. Music, on the other hand, cares not a whit for phonemes. Although music can often have words to be sung, music usually gets its identity not from the words, but from the rhythm and tune. Two songs with different words, but with the same rhythm and pitch sequence, are deemed by us to be the same tune, just with different words. That’s why we use the phrase “put words to the music”—because the words (and the phonemes) are not properly part of the meat of the music. The most central auditory feature of speech—its phonological characteristics—is mostly irrelevant to music, making speech an unlikely place to look for the origins of music.
Music is not only missing the phonological core of speech, but it is also missing another fundamental aspect of speech, its most evocative aspect: the meaning, or semantics. If music has its source in speech, and is evocative because of the evocative nature of speech, then why wouldn’t music require words with meaning, whether metaphorical or direct? Yet, as mentioned above, neither phonology nor words is an essential ingredient of music. (Although phonology and words are key ingredients in poetry.)
If music comes from speech, then it doesn’t come from the phonological patterns of speech, or from the semantics of speech. Although these core functions of speech are dead ends for a theory of music, there is another aspect of speech I have purposely glossed over. People overlay the sterile solid-object event sounds of speech with emotional overtones. We add intonation, a pitchlike property. We vary the emphasis of the words in a sentence, reminiscent of the way rhythm bestows emphasis in music (for instance, the first beat in a measure usually has enhanced emphasis). We vary the timing of the word utterances, akin to the temporal patterns of rhythm in music. And we sometimes modulate the overall loudness of our voices, like a musical crescendo or diminuendo. These prosody-related emotional overtones turn Stephen Hawking computer-voice speech into regular human speech. And these emotional overtones can be understood even in foreign speech, where our ears can often recognize the glib, the mournful, the proud, and the angry. We’re just not sure what they are glib, mournful, proud, or angry about.
So it is not quite true that speech sounds are sterile. Rather, it is the phonological solid-object event sounds that are sterile. The overtones of speech, on the other hand, are dripping with human emotion. Might these overtones underlie music? In an effort to answer, let’s discuss the four questions at the heart of any theory of music, the ones I referred to earlier as “brain,” “emotion,” “dance,” and “structure.”
Do we have a brain for the overtones of speech? An overtone theory of music would like to say that music “works” on our brains because it taps into speech overtone recognition mechanisms. Are we likely to have neural mechanisms for recognizing overtones of speech? Although I am suggesting in this book that we did not evolve to possess speech recognition mechanisms, we primates have been making nonspeech vocalizations (cries, laughs, shrieks, growls, moans, sighs, and so on) for tens of millions of years, and surely we have evolved neural mechanisms to recognize them. Perhaps the overtones of speech come from our ancient nonspeech vocalizations, and they get laid on top of the solid-object physical event sounds of speech like a whipped cream of evocativeness, a whipped cream our auditory system knows how to taste. An overtone-based theory of music, then, does have a plausible story to tell about why our brain would be highly efficient at recognizing overtones.
Can overtones potentially explain the evocativeness of music, the second hurdle we had discussed for any theory of music? Of course! Overtones are emotional, used in vocalization to be evocative. If music mimics emotional overtones, then it is easy to grasp how music can be evocative.
Can an overtone theory of music explain dance, the third hurdle I mentioned earlier? One can see how the emotional nonspeech vocalization of other people around us might provoke us into action of some kind—that’s probably why people are vocalizing in the first place. That’s a start. But we would like to know why hearing overtones would not just tend to provoke us to do stuff, but more specifically, make us move in a time-locked fashion to the emotional vocalizations. I have not been able to fathom any overtone-related story that could explain this, and the absence of any potential connection to dance is a hurdle that an overtone theory stumbles over.
Finally, can overtones explain the structure of music? Do the overtones of speech possess the patterns of pitch, loudness, and rhythm found in music? There is, at least, enough structure floating around in the prosody of speech that one can imagine it might be rich enough to help explain the structure found in music. But despite the nice confluence between ingredients in the overtones of speech and certain similar ingredients in music, overtones appear to be a very different beast from music. First and foremost, what’s missing in the overtones of speech is a beat, and a rhythm time-locked to a beat. That’s the one thing the lub-dub theory of music captured, but it is one of the most glaring shortcomings of overtone-based approaches, and it ultimately takes overtones of speech out of the running as a basis for a theory of music.
Before leaving speech for more fertile grounds—in fact, the next section is about sex—consider the two hurdles where overtones appeared promising: “brain” and “emotion.” I suggested earlier in this section that overtones could rely on ancient human nonlinguistic vocalizations, but there is another potential foundation for overtones’ evocative nature: the sounds of people moving. Rather than music coming from the overtones of speech, perhaps both music and overtones have their foundation in the more fundamentally meaningful sound patterns of humans’ expressive movements. (And perhaps this is the source of the intersections between music and speech in the brain discerned by Aniruddh D. Patel of the Neurosciences Institute in La Jolla, and other researchers.)
How About Sex?
Music does not appear to have its origins in the beating heart or in the overtones of speech. That’s where I stood on the problem as recently as 2007, when I had recently left Caltech for RPI. I was confident that music was not lub-dubs or speech, but I had no idea what music could be. I did, however, have a good idea of some severe constraints any theory of music must satisfy, namely the four hurdles we discussed earlier: brain, emotion, dance, and structure. After racking my brain for some months, and perhaps helped along by the fact that my wife was several months delayed in following me across country to my new job, it struck me: how about sex?
Reputable scientific articles—or perhaps I saw this in one of the women’s magazines on my wife’s bedside table—indicate that to have sex successfully, satisfying both partners and (if so desired) optimizing the chances of conception, the couple’s movements should be in sync with each other. Accordingly, one might imagine that we have been selected to respond to the rhythmic sex sounds of our partner by feeling the urge to match our own movement to his or hers. Evolution would select against people who did not “dance” upon hearing sex moves, and it would also select against people who responded with the sex dance every time a handshake was sufficiently vigorous. The auditory system would thus come to possess mechanisms for accurately detecting the sexual sounds of our partner. A “sex theory of music” of this kind has, then, a story for the “brain” hurdle.
In addition to satisfying the “brain” hurdle, the sex theory also has the beginnings of stories for the other three hurdles. Emotion? Sex concerns hot, steamy bodies, which is, ahem, evocative. Dance? The sex theory explains why we would feel compelled to move to the beat, thereby potentially addressing the “dance” hurdle. (In fact, perhaps the “sex theory” could explain why dance moves are so often packed with sexual overtones.) And, finally, structure? The sounds of sex often have a beat, the most essential structural feature of music a theory needs to explain.
I was on a roll! But before getting Hugh Hefner on the phone to go over the implications, I needed to figure out how to test the hypothesis. That’s simple, I thought. If music sounds like sex, then we should find the signature sounds of sex in music. The question then became, what are the signature sounds of sex? What I needed was to collect data from pornography. That, however, would surely land me in a heap of trouble of one kind or another, so I went with the next best thing: anthropology. I began searching for studies of human sexual intercourse, and in particular for “scores” notating the behavior and vocalizations of couples in the act. I also found scores of this kind for nonhuman primates—not my bag—which, I discovered, contain noticeably more instances of “biting” and “baring teeth” than most human encounters. My hope was to find enough of these so that I could compile an average “score” for a sexual encounter, and use it as a predictor of the length, tempo, pitch modulation, loudness modulation, and rhythm modulation of music.
I couldn’t find but a handful of such scores, and I did not have the chutzpah to acquire scores of my own. So I gave it up. I could have pushed harder to find data, but it seemed clear to me that, despite its initial promise, sex was far too narrow to possibly explain music. If music sounded like sex, then why isn’t all music sexy? And why does music evoke such a wide range of emotions, far beyond those that occur in the heat of sex? And how can the simple rhythmic sounds of sex possibly have enough structure to explain musical structure? Without answers to these questions, it was clear that I would have to take sex off the table.
Enough with the things I don’t think can explain music (heartbeats, speech, and sex)! It is about time I begin saying what I think music does sound like. And let’s edge closer to that by examining what music looks like.
Believe Your Eyes and Earworms
It is natural to assume that the visual information streaming into our eyes determines the visual perceptions we end up with, and that the auditory information entering our ears determines the events we hear. But the brain is more complicated than this. Visual and auditory information interact in the brain, and the brain utilizes both to guess what single scene to render a perception of. For example, the research of Ladan Shams, Yukiyasu Kamitani, and Shinsuke Shimojo at Caltech have shown that we perceive a single flash as a double flash if it is paired with a double beep. And Robert Sekuler and others from Brandeis University have shown that if a sound occurs at the time when the images of two balls pass through each other on a screen, the balls are instead perceived to have collided and reversed direction. These and other results of this kind demonstrate the interconnectedness of visual and auditory information in our brain. Visual ambiguity can be reduced by auditory information, and vice versa. And, generally, both are brought to bear in the brain’s attempt to guess about what’s out there.
Your brain, then, does not consist of independent visual and auditory systems, with separate troves of visual and auditory knowledge about the world. Instead, vision and audition talk to one another, and there are regions of cortex responsible for making vision and audition fit one another. These regions know about the sounds of looks and the looks of sounds. Because of this, when your brain hears something but cannot see it, your brain does not just sit there and refrain from guessing what it might have looked like. When your auditory system makes sense of something, it will have a tendency to activate visual areas, eliciting imagery of its best guess as to the appearance of the stuff making the sound. For example, when you hear the sound of your neighbor’s tree rustling, an image of its swaying, lanky branches may spring to mind. The mewing of your cat heard far away may evoke an image of it stuck high up in that tree. And the pumping of your neighbor’s kid’s BB gun can bring forth an image of the gun being pointed at Foofy way up there.
Your visual system, then, has strong opinions about the likely look of the things you hear. And, to get back to music, we can use the visual system’s strong opinions as an aid in gauging music’s meaning. In particular, we can ask your visual system what it thinks the appropriate visual is for music. If, for example, the visual system responds to music with images of beating hearts, then it would suggest, to my disbelief, that music mimics the sounds of heartbeats. If, instead, the visual system responds with images of pornography, then it would suggest that music sounds like sex. You get the idea.
But to get the visual system to act like an oracle, we need to get it to speak. How are we to know what the visual system thinks music looks like? One approach is to simply ask what visuals are routinely associated with music. For example, when people create imagery of musical notes, what does it look like? One cheap way to find out is simply to do a Google (or any search engine) image search on the term “musical notes.” You might think such a search would merely return images of simple notes on the page. However, that is not what one finds. To my surprise, actually, most of the images are like the one in Figure 16, with notes drawn in such a way that they appear to be moving through space. Notes in musical notation don’t look anything like this, and actual musical notes have no look at all (because they are sounds). And yet we humans seem prone to visually depict notes in lively motion.
Figure 16. Musical notes tend to be visualized like this, a clue to their meaning.
Could these images of notes in motion be due to a more mundane association? Music is played by people, and people have to move to play their instruments. Could this be the source of the movement-music association? I don’t think so, because the movement suggested in these images of notes doesn’t look anything like an instrument being played. In fact, it is common to show images of an instrument with the notes beginning their movement through space from the instrument: these notes are on their way somewhere, not tied to the musician’s key-pressing or back-and-forth swaying.
Could it be that the musical notes are depicted as moving through space because sound waves move through space? The difficulty with this hypothesis is that all sound moves through space. All sound would, if this were so, be visually rendered as moving through space, but that’s not how we portray most sounds. For example, speech is not usually visually rendered as moving through space. Another difficulty is that the musical notes in these images are usually meandering, but sound waves don’t meander—sound waves go straight. A third problem with the notion that sound waves are the basis for the visual metaphor is that we never see sound waves in the first place.
Another possible counterhypothesis is that musical notes are visually depicted in motion because all auditory stimuli are caused by underlying events that involve movement of some kind. The first difficulty, as with sound waves, is that not all sound, by a long shot, is visually rendered as in motion. The second difficulty is that, while it is true that sounds are typically generated by movement of some kind, it need not be movement of an entire object through space. Moving parts within the object may make the noise, without the object going anywhere. In fact, the three examples I gave at the start of this section—leaves rustling, Foofy mewing, and the BB gun pumping—are noises without any bulk movement of the object (the tree, Foofy, or the BB gun, respectively). The musical notes in these images, on the other hand, really do seem to be moving their whole selves across space.
Music is like rustling leaves, Foofy, BB guns, and human speech, in that it is not made by bulk movements through space. And yet music appears uniquely likely to be visually depicted as notes moving through space. And not only moving, but meandering. When visually rendered, music looks alive and in motion (often along the ground)—just what one might expect if music’s secret is that it sounds like people moving.
A Google image search on “musical notes” is one way to try to discern what the visual system thinks music looks like. Another is simply to ask ourselves: what is the most common visual display shown during music? That is, if people were to make videos to go with music, what would the videos tend to look like? Luckily for us, people do make videos to go with music! They’re called music videos, of course. And what do they look like? The answer is so obvious that it hardly seems worth noting: music videos commonly show people moving about, usually in a manner that is time-locked to the music, very often dancing. As obvious as it is that music videos typically show people moving, we must remember to ask ourselves why music isn’t typically visually associated with something very different. Why aren’t music videos mostly of rivers, avalanches, car races, windblown grass, lions hunting, fire, or bouncing balls? It is because, I am suggesting, our brain thinks that humans moving about is what music should look like . . . because it thinks that humans moving about is what music sounds like.
Musical notes are rendered as meandering through space. Music videos are built largely from people moving, and in a manner time-locked to the music. That begins to suggest that the visual system is under the impression that music sounds like human movement. But if that’s really what the visual system thinks, then it should have more opinions than just “music sounds like movement.” It should have opinions about what kind of movement music sounds like, and therefore, more exactly what the movement should look like. Do our visual systems have opinions this precise? Are we picky about the visual movement that goes with music?
You bet we are! That’s choreography. It’s not OK to play a video of the Nutcracker ballet during Beatles music, nor is it OK to play a video of the Nutcracker to the music of Nutcracker, but with a small time lag between them. Video of human movement has to have all the right moves at the right time to be the right fit for music.
These strong opinions about what music looks like make perfect sense if music mimics human movement sounds. In real life, when people carry out complex behaviors, their visible movements are tightly choreographed with the sounds they make—because the sight and the sound arise from the same event. When you hear movement, you expect to see that same movement. Music sounds to your brain like human movement, and that’s why, when your brain hears music, it expects that any visual of it should match up with it.
We just used your brain’s visual system as an oracle to divine the meaning of music, and it answered, “People moving.” Let’s now use your brain in another oracle-like fashion. If music has been culturally selected to fit the brain, then let’s look into which pieces of music are the best fit for the brain, with the idea that these pieces may be the best representatives of what music has been culturally selected to sound like. But how can we gauge which pieces of music are the best fits? One thought is that “symptoms” of a piece of music fitting the brain really well might be that the brain would process it especially easily, remember it easily, and internally hear it easily. Are there pieces of music like this?
Yes, there are! They’re called earworms—those songs with a tendency to get stuck in people’s heads. These pieces of music fit the brain so well that they can sometimes become nuisances. Earworms, then, may be great representatives of the fundamental structural features that have been selected for in music. What are the common qualities of pieces of music that become earworms?
When he was an RPI graduate student, Aaron Fath got interested in this question. He was dissatisfied with the standard line that songs become earworms because they are highly repetitive. Most songs are highly repetitive, he reasoned. Instead, he began to notice that a large fraction of earworms have a particular dance or move that goes along with the music. Examples of songs tightly connected to a particular movement include “I’m a Little Teacup,” “Macarena,” “YMCA,” “Chicken Dance,” “If You’re Happy and You Know It,” and “Head, Shoulders, Knees and Toes.” Let’s call these pieces movement-explicit. He also noticed that many other earworms were songs that accompanied specific visual movements (like a commercial jingle on television) or were dance songs (even if no specific movements were associated with them).
Aaron used two existing catalogs of earworms: a top 17 list of earworms from James Kellaris of the University of Cincinnati (obtained by polling 559 students), and a list of “top annoying earworms” from an online poll at the website Keepers of Lists (one user posted 220 songs, and 80 other users voted on whether or not they were earwormy; Aaron took the 38 songs having more than 10 votes). Movement-explicit pieces accounted for 23.5 percent and 18.4 percent of these lists. To gauge whether these are unusually large percentages of movement-explicit pieces, he sampled the #8 song on the Billboard Hot 100 Chart every nine months from 1983 to the present, and among these 38 songs, none were of the movement-explicit variety. As a second gauge, he sampled the #1 songs for each year from 1955 through 2006 (defined by Aaron—differently than Billboard does it—as the song released in a year that was #1 on the Billboard Hot 100 for the greatest number of weeks, and thus had the most staying power). Of these 52 songs, only one was of the movement-explicit kind (namely, “Macarena”).
These data suggest that earworms are disproportionately movement-explicit: about one-fifth of the earworms had specific dance moves that went with them, whereas less than 2 percent of top pop songs are of this kind. Our speculation is that songs become earworms not because they are movement-explicit so much as because they are consistent with the sounds of people moving—movement-explicit songs just happen to be under especially strong selection pressure to be consistent with the sounds of people moving. Although only a fifth of the earworms were of the movement-explicit kind, many of the others seemed to be in the “accompaniment” or “dance” category (although we have not yet tried to operationally measure these and compare them to control data sets). An alternative possibility is that when a song becomes tightly linked to movement, it is that very association that helps make it an earworm. This would suggest that music becomes more brain-worthy when packaged together with a motor program, and this, too, would appear to point to the music-is-movement theory.
It looks like music may be the sounds of human movement. We asked the expert on how things look: your visual system. Like presenting a deeply encrypted code to an oracle, we asked for the visual system’s interpretation of that enigmatic thing called music, and it had a clear and resounding response: music sounds like people moving and doing things, and thus must be visually rendered as humanlike motion in sync with the musical sounds. We also queried your brain in another fashion: we asked it which songs it most revels in, which ones are so earwormalicious that the brain loves to internally sing them over and over again. And the brain answered: the more movement-explicit songs are more likely to be the earwormy ones. The brain seems to be under the impression that music sounds like people moving.
Brain and Emotion
The opinion of visual systems and the hints of earworms are interesting and motivating, but we can’t just take them at their word. In order to make a solid case that music sounds like human movement, I need to show that the music-is-movement theory can leap the four hurdles we discussed earlier: “brain,” “emotion,” “dance,” and “structure.” Let’s begin in this section with the first two.
For the “brain” hurdle, I need to say why our brain would have mechanisms for making sense of music and responding to it so eagerly and intricately. For the theory that music sounds like human movement, then, we must ask ourselves if it is plausible that we have brain mechanisms for processing the sounds of humans doing stuff. The answer is yes. Of course we have humans-doing-stuff auditory mechanisms! The most important animals in the life of any animal are its conspecifics (other animals of the same species), and so our brains are well equipped to communicate with and “read” our fellow humans. Face recognition is one familiar example, and color vision, with its ability to detect emotional signals on the skin, is another one (which I discussed in detail in my previous book, The Vision Revolution). It would be bizarre if we had no specialized auditory mechanisms for sensing the sounds of other people carrying out behaviors. Actions speak louder than words—the sounds we make when we act are often a dead giveaway to what we’re up to. And we’ve been making sounds when we move for many millions of years, plenty long enough to have evolved such mechanisms. The music-sounds-like-movement hypothesis, then, can make a highly plausible case that it satisfies the “brain” hurdle. Our brains surely have evolved to possess specialized mechanisms to hear what people are doing.
How about the second hurdle for a theory of music, the one labeled “emotion”? Could the mundane sounds of people moving underlie our love affair with music? As we discussed at the start of the chapter, music is evocative—it can sound joyous, aggressive, melancholy, amorous, tortured, strong, lethargic, and so on. I said then that the evocative nature of music suggests that it must be “made out of people.” Human movement is, obviously, made of and by people, but can human movement truly be evocative? Of course! The ability to infer emotional states from the bodily movements of others comes via several routes. First and foremost, when people carry out behaviors they move their bodies, movements that can give away what the person is doing; knowing what the person is doing can, in turn, be crucial for understanding the actor’s emotion or mood. Second, the actor’s emotional state is often cued by its side effects on behavior, such as when an exhausted person staggers. And third, some bodily movements serve as direct emotional signals, more akin to facial expressions and color signals: bodily movements can be proud, strutting, threatening, ebullient, jaunty, sulking, arrogant, inviting, and so on. Human movement can, then, certainly be evocative. And unlike evocative facial expressions and skin color signals, which are silent, our evocative bodily expressions and movements make noises. The sounds of human movement not only are “made from people,” then, but they can be truly evocative, fulfilling the “emotion” hurdle.
An example will help to clarify how the sounds of human movement can be emotionally evocative. Michael Zampi, then an undergraduate at RPI, was interested in uncovering the auditory cues for happy, sad, and angry walkers. He first noted that University of Tübingen researchers Claire L. Roether, Lars Omlor, Andrea Christensen, and Martin A. Giese had observed that happy walkers tend to lean back and have large arm and leg swings, angry walkers lean forward and have large arm and leg swings, and sad walkers tend to lean forward and have attenuated arm and leg swings.
“What,” Michael asked, “are the distinctive sounds for those three gaits?” He reasoned that leaning back leads to a larger gap between the sound of the heel and the sound of the toe. And, furthermore, larger arm and leg swings tend to lend greater emphasis to any sounds made by the limbs in between the footsteps (later I will refer to these sounds as “banging ganglies”). Given this, Michael could conclude that happy walkers have long heel-toe gaps and loud between-the-steps gait sounds; angry walkers have short heel-toe temporal gaps and loud between-the-steps gait sounds; and sad walkers have short heel-toe gaps and soft between-the-steps gait sounds. But are these cues sufficient to elicit the perception that a walker is happy, angry, or sad?
Michael created simple rhythms, each with three drum strikes per beat: a toe-strike on the beat, a heel strike just before the beat, and a between-the-step hit on the off-beat. Starting from a baseline audio track—an intermediate heel-toe gap and a between-the-steps sound with intermediate emphasis—Michael created versions with shorter and longer heel-toe gaps, and versions with less emphasized and more emphasized between-the-steps sounds. Listeners were told they would hear the sounds of people walking in various emotional states, and then the listeners were presented with the baseline stimulus, followed by one of the four modulations around it. They were asked to volunteer an emotion term to describe the modulated gait. As can be seen in Figure 17, subjects had a tendency to perceive the simulated walker’s emotion accurately.
Figure 17. Each column is for one of the three tracks having the sounds modulating around the baseline to indicate the labeled emotion. The numbers show how many subjects volunteered the emotions “angry,” “happy,” “sad,” or other emotions words for each of the three tracks. One can see that the most commonly perceived emotion in each column matches the gait’s emotion.
This pilot study of Michael Zampi’s is just the barest beginning in our attempts to make sense of the emotional cues in the sounds of people moving. The hope is that by understanding these cues, we can better understand how music modulates emotion, and perhaps why genres differ in their emotional effects.
If music has been culturally selected to sound like human movement, then it is easy to see why we’d have a brain for it, and easy to see why music can be so emotionally moving. But why should music be so motionally moving? The music-is-movement theory has to explain why the sounds of people moving should impel other people to move. That’s the third hurdle over which we must leap: the “dance” hurdle, which we take up next.
Motionally Moving
Group activities with toddlers are hopeless. Just as you get the top toddler into position at the peak of the toddler pyramid, several on the bottom level have begun crying, pooping, or wandering away. Toddlers prefer to treat their day-care mates as objects to ignore, climb over, or hit. And just try getting a dozen of them to do anything in unison, like performing “the wave” in the audience at a roller derby! If aliens observed us humans only during toddlerhood, they might conclude that we don’t get on well in groups, and that, lacking a collaborative spirit, we will be easy prey when they invade.
But brain-thirsty aliens might come to a very different conclusion if they dropped in on a day-care center during music time. Flip on “The Wheels on the Bus Go Round and Round,” and a dozen randomly wandering, cantankerous droolers begin shaking their stinky bottoms in unison. Aliens might surmise that music is some kind of marching order, a message from the human commander to activate gyrations against an invading enemy.
Dancing toddlers, of course, play little or no role in explaining why we haven’t been invaded by aliens, but they do raise an important question. Why do toddlers seem to be compelled to move to the music? And, more generally, why is this a tendency we keep into adulthood? At this very moment of writing, I am, in fact, swaying slightly to Tchaikovsky’s Piano Concerto No. 1. Don’t I have better things to do? Yes, I do—like write this book. Yet I keep pausing to hear the music, and end up ever so slightly dancing. It is easy to understand why people dance when a gun is fired at their feet like in old Westerns, but music is so much less substantial than lead, and yet it can get us going as surely as a Colt 45. What is the source of music’s power to literally move us, like rats to the Pied Piper’s flute?
We can make sense of this mystery in light of the theory that music sounds like human movement. If music sounds like movement, and music makes us move, then it is not so much music that is making us move, but the sound of human movement. And that’s not at all mysterious! Of course the behaviors of others may elicit responsive actions from us. For example, if my three-year-old son barrels headlong toward my groin, I quickly move my hands downward for protection. If he throws a rubber ball at my head, I try to catch it. And if he suddenly decides he’d rather not wear his bathing shorts, I quickly pull them back up. Not only do I behave in reaction to my son’s behavior, but my behavior must be timed appropriately, lest he careen into me, bean me with a ball, or strip buck-naked and get a head start in his dash away. Music sounds like human behavior, and human behavior often elicits appropriately timed behavioral responses in others, so it is not a surprise, in light of the theory, that music elicits appropriately timed behavioral responses.
It’s easy to see why three-year-old aggressive and streaking behaviors would prompt a well-timed response in others (especially parents). Another common category of human behavior that elicits a behavioral response in others, in fact one of the most common, is expressive behavior. Human expressions are for other humans to see or hear or smell, precisely in order to prompt them to modulate their behavior. Sometimes another person’s response may be a complex whole-body behavior (I give my wife my come-hither look, she responds by going thither), and sometimes the other person’s behavioral response may simply be an expression of emotion (I grimace and rub my newly minted bruise, and my son responds by smiling). If music is good at getting us to move, then, in this light, one suspects that music must usually sound not merely like movement that kicks (literally, in my son’s case) listeners into moving in response, but, more specifically, like human emotional or expressive behaviors.
Sound triggering movement. That’s starting to sound a bit like dance. To more fully understand dance, we must grasp one further thing: contagious behaviors—behavioral expressions that tend to spread. For example, if I smile, you may smile back; and if I scowl, you’ll likely scowl back. Even yawns are catching. And contagious behavior is not confined to the face. Nervous behavior can spread, and angry bodily stances are likely to be reciprocated. If you raise your hands high into the air, a typical toddler will also do so, at which point you have a clear tickle shot. Even complex whole-body behaviors are contagious, accounting for why, for example, people in a crowd often remain passive bystanders when someone is being attacked (other people’s inaction spreads), and how a group of people can become a riotous mob (other people’s violent behavior spreads). By the way, have you yawned yet?
Music, then, may elicit movement for the same reasons that a cartoon smiley face can elicit smiles in us: music can often sound like contagious expressive human behavior and movement, and trigger a similar expressive movement in us. Music may not be marching orders from our commander, but it can sometimes cue our emotional system so precisely that we feel almost compelled to march in lockstep with music’s fictional mover. And this is true whether we are adults or toddlers. When music is effective at getting us to mimic the movement it mimics, we call it dance music, be it a Strauss waltz or a Grateful Dead flail.
The music-sounds-like-movement theory can, then, explain why music provokes us to dance—the third of the four hurdles a theory of music must leap over. The fourth and final hurdle concerns the structure of music, and it will take the upcoming chapter and the Encore chapter to make the case that music has the signature structure of humans moving.
Don’t Roll Over, Beethoven
The case for my theory is strong, I believe, and I hope to convince you that music sounds like human movement. If I am correct, then, with the movement-meaning of music in hand, we will be in a position to create a new generation of “supermusic”: music deliberately designed to be even more aesthetically pleasing, by far, than previous generations of music. Music has historically been “trying” to shape itself like expressive human behaviors, in the sense that that was what was culturally selected for. But individual composers didn’t know what music was trying to be—composers didn’t know that music works best when tapping into our human-movement auditory mechanisms. Musical works have heretofore tended to be sloppy mimickers of human movement. With music decoded, however, we can tune it perfectly for our mental software, and blow our minds. You’re toast, Beethoven! I’ve unraveled your secrets!
No. Just kidding. I’m afraid that the music research I’m describing to you will do no such thing, even if every last claim I make is true. To see why the magic of Beethoven is not unraveled by my theory, consider photographic art. Some photographs have evocative power; they count as art. Some photographs, however, are just photographs, and not art. What exactly distinguishes the art from the “not” is a genuine mystery, and certainly beyond me. But there is something that is obviously true about art photographs: they are photographs. Although that’s obvious to us, imagine for a moment that four-dimensional aliens stumble upon a pile of human artifacts, and that in the pile are photographs. Being four-dimensional creatures, they have poor intuitions about what a three-dimensional world looks like from a particular viewpoint inside it. Consequently, our human photographs are difficult to distinguish from the many other human artifacts that are flat with stuff printed upon them, such as wallpaper, clothing, and money. If they are to realize that the photographs are, in fact, photographs—two-dimensional representations of our 3-D world—they are going to have to discover this.
Luckily for them, one alien scientist who has been snooping around these artifacts has an idea. “What if,” he hypothesizes, “some of the flat pieces of paper with visual marks are photographs? Not of our 4-D world, but of their human 3-D world?” In an effort to test this idea, he works out what the signature properties of photographs of 3-D worlds would be, such as horizons, vanishing points, projective geometry, field of focus, partial occlusion, and so on. Then he searches among the human artifacts for pieces of paper or fabric having these properties. He can now easily conclude that wallpaper, clothing, and money are not photographs. And when he finds some of our human photographs, he’ll be able to establish that they are photographs, and convince his colleagues. This alien’s research would amount to a big step forward for those aliens interested in understanding our world and how we perceive it. A certain class of flat artifacts is meaningful in a way they had not realized, and now they can begin to look at our photographs in this new light, and see our 3-D world represented in them.
The theory of music I am defending here is akin to the alien’s theory that some of those flat artifacts are views of 3-D scenes. To us, photographs are obviously of 3-D scenes; but to the aliens this is not at all obvious. And, similarly, to our auditory system, music quite obviously is about human action; but to our conscious selves this is not in the least obvious (our conscious selves are aliens to music’s deeper meaning).
To see why this book cannot answer what is good music, consider what this alien scientist’s discovery about photographs would not have revealed. Unbeknownst to the alien, some of the photographs are considered by us humans to be genuine instances of art, and the rest of the photographs are simply photographs. This alien’s technique for distinguishing photographs from nonphotographs is no use at all for distinguishing the artful photographs from the mere photographs. Humanity’s greatest pieces of photographic art and the most haphazard kitsch would all be in the same bag, labeled “views of a 3-D world.” By analogy, the most expressive human movement sounds and the most run-of-the-mill human movement sounds are all treated the same by the ideas I describe in this book; they are all in the same bag, labeled “human movement sounds.” Although it is expressive human movements that probably drive the structure of music, I have enough on my hands just trying to make the beginnings of a case that music sounds like human movement. Just as it is easier for the four-dimensional alien to provide evidence for photograph-ness than to provide evidence for artsy-photograph-ness, it is much easier for me to provide evidence that music is human-movement-ish than to provide evidence that it is expressive-human-movement-ish. Photographic art is views of 3-D scenes, but views of 3-D scenes need not be photographic art. Similarly, music is made of the sounds of humans moving, but the sounds of humans moving need not be—and usually are not—music.
Relax, Beethoven—no need to roll over. If the music-sounds-like-movement theory is correct, then it is best viewed as a cipher key for decoding music. It gives our conscious, scientific selves the ability to translate the sounds of music back into the movements of humans (something our own lower-level auditory areas already know how to do). But knowing how to read the underlying movement meaning of music does not mean one knows how to write music. Just as I can read great literature but cannot create it, a successful music-is-movement theory will allow us to read the meaning of music but not to compose it. Creating good music requires knowing which human movements are most expressive, and making music sound like that. But a theory of expressive human movements is far harder to formulate than a theory of human movements generally. All I can hope to muster is a general theory of the sounds of human movements, and so the theory will be, at best, a decoder ring, not a magical composer of great music.
But a decoder ring may nevertheless be a big step forward for composers. Composers have thus far managed to create great music—great auditory stories of human movements, in our theory’s eyes—without explicitly understanding what music means. With a better understanding of the decoder ring, composers can consciously employ it in the creative process. Similarly, the four-dimensional alien has much better odds of mimicking artistic photography once he has figured out what photographs actually look like. Until then, the alien’s attempts at artistic photography wouldn’t even look like photography. (“Is this photographic art?” the alien asks, holding up a plaid pattern.) The aliens must know what basic visual elements characterize photography before they can take it to the next level, start to guess which arrangements of those elements are superior, and try their own tentacles at art photography. You can’t have expressive photography without photography, and you can’t have expressive human movement sounds without human movement sounds. The theory of music I’m arguing for, then, does not explain what makes great music. But the theory would nevertheless be a big step forward for this. Like the alien’s basic discovery, it will enable us to pose hypotheses about why some music is great—by referring to the expressive movements and behaviors it depicts.
This decoder ring will, then, be helpful to composers, but it cannot substitute for the expressive antennae composers use to create musical art. For choreographers and movie composers, this decoder ring is potentially much more important. Choreographers and movie composers are deeply concerned with the mapping of music to movement (the principal domain of choreography) or from movement to music (the principal domain of movie composers), and so a decoder ring that translates one to the other is a potential holy grail. In reality, though, it’s not as simple as that. A given piece of music probably does not determine particular dance moves (although your auditory system may pick out just one movement)—a good choreographer needs an artistic head to pick the most appropriately expressive movement of the many possible movements consistent with the music. And for any given movie visual, a good film composer will have to use his or her artistic talents to find an appropriately expressive theme for the scene. Any music–movement decoding devices made possible by this book won’t put choreographers or movie composers out of work, but such a decoder may serve as an especially useful tool for these disciplines, providing new, biologically justified constraints on what makes a good music-movement match.
So, what is great music? I don’t know. My only claim is that it tends to be written in the language of human movement. Music is movement. But it is not the case that movement is music. Just as most stories are not interesting, most possible movement sounds are not pleasing. What good composers know in their bones is which movement sounds are expressive, and which sequences of movement sounds tell an evocative story. But they also know even deeper in their bones which sounds sound like humans moving, and that is what we’ll be discussing next, in the upcoming chapter and in the Encore.
[1] Researchers in this tradition include Alf Gabrielsson, Patrick Shove, Bruno H. Repp, Neil P. McAngus Todd, Henkjan Honing, Jacob Feldman, and Eric F. Clarke (see his Ways of Listening).