Because important things go in a case, you’ve got a skull for your brain, a plastic sleeve for your comb, and a wallet for your money.
Now, for the first time, we are observing the brain at work in a global manner with such clarity that we should be able to discover the overall programs behind its magnificent powers.
The mind, in short, works on the data it receives very much as a sculptor works on his block of stone. In a sense the statue stood there from eternity. But there were a thousand different ones beside it, and the sculptor alone is to thank for having extricated this one from the rest. Just so the world of each of us, howsoever different our several views of it may be, all lay embedded in the primordial chaos of sensations, which gave the mere matter to the thought of all of us indifferently. We may, if we like, by our reasonings unwind things back to that black and jointless continuity of space and moving clouds of swarming atoms which science calls the only real world. But all the while the world we feel and live in will be that which our ancestors and we, by slowly cumulative strokes of choice, have extricated out of this, like sculptors, by simply rejecting certain portions of the given stuff. Other sculptors, other statues from the same stone! Other minds, other worlds from the same monotonous and inexpressive chaos! My world is but one in a million alike embedded, alike real to those who may abstract them. How different must be the worlds in the consciousness of ant, cuttle-fish, or crab!
Is intelligence the goal, or even a goal, of biological evolution? Steven Pinker writes, “We are chauvinistic about our brains, thinking them to be the goal of evolution,”1 and goes on to argue that “that makes no sense…. Natural selection does nothing even close to striving for intelligence. The process is driven by differences in the survival and reproduction rates of replicating organisms in a particular environment. Over time, the organisms acquire designs that adapt them for survival and reproduction in that environment, period; nothing pulls them in any direction other than success there and then.” Pinker concludes that “life is a densely branching bush, not a scale or a ladder, and living organisms are at the tips of branches, not on lower rungs.”
With regard to the human brain, he questions whether the “benefits outweigh the costs.” Among the costs, he cites that “the brain [is] bulky. The female pelvis barely accommodates a baby’s outsized head. That design compromise kills many women during childbirth and requires a pivoting gait that makes women biomechanically less efficient walkers than men. Also a heavy head bobbing around on a neck makes us more vulnerable to fatal injuries in accidents such as falls.” He goes on to list additional shortcomings, including the brain’s energy consumption, its slow reaction time, and the lengthy process of learning.
While each of these statements is accurate on its face (although many of my female friends are better walkers than I am), Pinker is missing the overall point here. It is true that biologically, evolution has no specific direction. It is a search method that indeed thoroughly fills out the “densely branching bush” of nature. It is likewise true that evolutionary changes do not necessarily move in the direction of greater intelligence—they move in all directions. There are many examples of successful creatures that have remained relatively unchanged for millions of years. (Alligators, for instance, date back 200 million years, and many microorganisms go back much further than that.) But in the course of thoroughly filling out myriad evolutionary branches, one of the directions it does move in is toward greater intelligence. That is the relevant point for the purposes of this discussion.
Physical layout of key regions of the brain.
The neocortex in different mammals.
Suppose we have a blue gas in a jar. When we remove the lid, there is no message that goes out to all of the molecules of the gas saying, “Hey, guys, the lid is off the jar; let’s head up toward the opening and out to freedom.” The molecules just keep doing what they always do, which is to move every which way with no seeming direction. But in the course of doing so, some of them near the top will indeed move out of the jar, and over time most of them will follow suit. Once biological evolution stumbled on a neural mechanism capable of hierarchical learning, it found it to be immensely useful for evolution’s one objective, which is survival. The benefit of having a neocortex became acute when quickly changing circumstances favored rapid learning. Species of all kinds—plants and animals—can learn to adapt to changing circumstances over time, but without a neocortex they must use the process of genetic evolution. It can take a great many generations—thousands of years—for a species without a neocortex to learn significant new behaviors (or in the case of plants, other adaptation strategies). The salient survival advantage of the neocortex was that it could learn in a matter of days. If a species encounters dramatically changed circumstances and one member of that species invents or discovers or just stumbles upon (these three methods all being variations of innovation) a way to adapt to that change, other individuals will notice, learn, and copy that method, and it will quickly spread virally to the entire population. The cataclysmic Cretaceous-Paleogene extinction event about 65 million years ago led to the rapid demise of many non-neocortex-bearing species that could not adapt quickly enough to a suddenly altered environment. This marked the turning point for neocortex-capable mammals to take over their ecological niche. In this way, biological evolution found that the hierarchical learning of the neocortex was so valuable that this region of the brain continued to grow in size until it virtually took over the brain of Homo sapiens.
Discoveries in neuroscience have established convincingly the key role played by the hierarchical capabilities of the neocortex as well as offered evidence for the pattern recognition theory of mind (PRTM). This evidence is distributed among many observations and analyses, a portion of which I will review here. Canadian psychologist Donald O. Hebb (1904–1985) made an initial attempt to explain the neurological basis of learning. In 1949 he described a mechanism in which neurons change physiologically based on their experience, thereby providing a basis for learning and brain plasticity: “Let us assume that the persistence or repetition of a reverberatory activity (or ‘trace’) tends to induce lasting cellular changes that add to its stability…. When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”2 This theory has been stated as “cells that fire together wire together” and has become known as Hebbian learning. Aspects of Hebb’s theory have been confirmed, in that it is clear that brain assemblies can create new connections and strengthen them, based on their own activity. We can actually see neurons developing such connections in brain scans. Artificial “neural nets” are based on Hebb’s model of neuronal learning.
The central assumption in Hebb’s theory is that the basic unit of learning in the neocortex is the neuron. The pattern recognition theory of mind that I articulate in this book is based on a different fundamental unit: not the neuron itself, but rather an assembly of neurons, which I estimate to number around a hundred. The wiring and synaptic strengths within each unit are relatively stable and determined genetically—that is, the organization within each pattern recognition module is determined by genetic design. Learning takes place in the creation of connections between these units, not within them, and probably in the synaptic strengths of those interunit connections.
Recent support for the basic module of learning’s being a module of dozens of neurons comes from Swiss neuroscientist Henry Markram (born in 1962), whose ambitious Blue Brain Project to simulate the entire human brain I describe in chapter 7. In a 2011 paper he describes how while scanning and analyzing actual mammalian neocortex neurons, he was “search[ing] for evidence of Hebbian assemblies at the most elementary level of the cortex.” What he found instead, he writes, were “elusive assemblies [whose] connectivity and synaptic weights are highly predictable and constrained.” He concludes that “these findings imply that experience cannot easily mold the synaptic connections of these assemblies” and speculates that “they serve as innate, Lego-like building blocks of knowledge for perception and that the acquisition of memories involves the combination of these building blocks into complex constructs.” He continues:
Functional neuronal assemblies have been reported for decades, but direct evidence of clusters of synaptically connected neurons…has been missing…. Since these assemblies will all be similar in topology and synaptic weights, not molded by any specific experience, we consider these to be innate assemblies…. Experience plays only a minor role in determining synaptic connections and weights within these assemblies…. Our study found evidence [of] innate Lego-like assemblies of a few dozen neurons…. Connections between assemblies may combine them into super-assemblies within a neocortical layer, then in higher-order assemblies in a cortical column, even higher-order assemblies in a brain region, and finally in the highest possible order assembly represented by the whole brain…. Acquiring memories is very similar to building with Lego. Each assembly is equivalent to a Lego block holding some piece of elementary innate knowledge about how to process, perceive and respond to the world…. When different blocks come together, they therefore form a unique combination of these innate percepts that represents an individual’s specific knowledge and experience.3
The “Lego blocks” that Markram proposes are fully consistent with the pattern recognition modules that I have described. In an e-mail communication, Markram described these “Lego blocks” as “shared content and innate knowledge.”4 I would articulate that the purpose of these modules is to recognize patterns, to remember them, and to predict them based on partial patterns. Note that Markram’s estimate of each module’s containing “several dozen neurons” is based only on layer V of the neocortex. Layer V is indeed neuron rich, but based on the usual ratio of neuron counts in the six layers, this would translate to an order of magnitude of about 100 neurons per module, which is consistent with my estimates.
The consistent wiring and apparent modularity of the neocortex has been noted for many years, but this study is the first to demonstrate the stability of these modules as the brain undergoes its dynamic processes.
Another recent study, this one from Massachusetts General Hospital, funded by the National Institutes of Health and the National Science Foundation and published in a March 2012 issue of the journal Science, also shows a regular structure of connections across the neocortex.5 The article describes the wiring of the neocortex as following a grid pattern, like orderly city streets: “Basically, the overall structure of the brain ends up resembling Manhattan, where you have a 2-D plan of streets and a third axis, an elevator going in the third dimension,” wrote Van J. Wedeen, a Harvard neuroscientist and physicist and the head of the study.
In a Science magazine podcast, Wedeen described the significance of the research: “This was an investigation of the three-dimensional structure of the pathways of the brain. When scientists have thought about the pathways of the brain for the last hundred years or so, the typical image or model that comes to mind is that these pathways might resemble a bowl of spaghetti—separate pathways that have little particular spatial pattern in relation to one another. Using magnetic resonance imaging, we were able to investigate this question experimentally. And what we found was that rather than being haphazardly arranged or independent pathways, we find that all of the pathways of the brain taken together fit together in a single exceedingly simple structure. They basically look like a cube. They basically run in three perpendicular directions, and in each one of those three directions the pathways are highly parallel to each other and arranged in arrays. So, instead of independent spaghettis, we see that the connectivity of the brain is, in a sense, a single coherent structure.”
Whereas the Markram study shows a module of neurons that repeats itself across the neocortex, the Wedeen study demonstrates a remarkably orderly pattern of connections between modules. The brain starts out with a very large number of “connections-in-waiting” to which the pattern recognition modules can hook up. Thus if a given module wishes to connect to another, it does not need to grow an axon from one and a dendrite from the other to span the entire physical distance between them. It can simply harness one of these axonal connections-in-waiting and just hook up to the ends of the fiber. As Wedeen and his colleagues write, “The pathways of the brain follow a base-plan established by…early embryogenesis. Thus, the pathways of the mature brain present an image of these three primordial gradients, physically deformed by development.” In other words, as we learn and have experiences, the pattern recognition modules of the neocortex are connecting to these preestablished connections that were created when we were embryos.
There is a type of electronic chip called a field programmable gate array (FPGA) that is based on a similar principle. The chip contains millions of modules that implement logic functions along with connections-in-waiting. At the time of use, these connections are either activated or deactivated (through electronic signals) to implement a particular capability.
In the neocortex, those long-distance connections that are not used are eventually pruned away, which is one reason why adapting a nearby region of the neocortex to compensate for one that has become damaged is not quite as effective as using the original region. According to the Wedeen study, the initial connections are extremely orderly and repetitive, just like the modules themselves, and their grid pattern is used to “guide connectivity” in the neocortex. This pattern was found in all of the primate and human brains studied and was evident across the neocortex, from regions that dealt with early sensory patterns up to higher-level emotions. Wedeen’s Science journal article concluded that the “grid structure of cerebral pathways was pervasive, coherent, and continuous with the three principal axes of development.” This again speaks to a common algorithm across all neocortical functions.
It has long been known that at least certain regions of the neocortex are hierarchical. The best-studied region is the visual cortex, which is separated into areas known as V1, V2, and MT (also known as V5). As we advance to higher areas in this region (“higher” in the sense of conceptual processing, not physically, as the neocortex is always just one pattern recognizer thick), the properties that can be recognized become more abstract. V1 recognizes very basic edges and primitive shapes. V2 can recognize contours, the disparity of images presented by each of the eyes, spatial orientation, and whether or not a portion of the image is part of an object or the background.6 Higher-level regions of the neocortex recognize concepts such as the identity of objects and faces and their movement. It has also long been known that communication through this hierarchy is both upward and downward, and that signals can be both excitatory and inhibitory. MIT neuroscientist Tomaso Poggio (born in 1947) has extensively studied vision in the human brain, and his research for the last thirty-five years has been instrumental in establishing hierarchical learning and pattern recognition in the “early” (lowest conceptual) levels of the visual neocortex.7
The highly regular grid structure of initial connections in the neocortex found in a National Institutes of Health study.
Another view of the regular grid structure of neocortical connections.
The grid structure found in the neocortex is remarkably similar to what is called crossbar switching, which is used in integrated circuits and circuit boards.
Our understanding of the lower hierarchical levels of the visual neocortex is consistent with the PRTM I described in the previous chapter, and observation of the hierarchical nature of neocortical processing has recently extended far beyond these levels. University of Texas neurobiology professor Daniel J. Felleman and his colleagues traced the “hierarchical organization of the cerebral cortex…[in] 25 neocortical areas,” which included both visual areas and higher-level areas that combine patterns from multiple senses. What they found as they went up the neocortical hierarchy was that the processing of patterns became more abstract, comprised larger spatial areas, and involved longer time periods. With every connection they found communication both up and down the hierarchy.8
Recent research allows us to substantially broaden these observations to regions well beyond the visual cortex and even to the association areas, which combine inputs from multiple senses. A study published in 2008 by Princeton psychology professor Uri Hasson and his colleagues demonstrates that the phenomena observed in the visual cortex occur across a wide variety of neocortical areas: “It is well established that neurons along the visual cortical pathways have increasingly larger spatial receptive fields. This is a basic organizing principle of the visual system…. Real-world events occur not only over extended regions of space, but also over extended periods of time. We therefore hypothesized that a hierarchy analogous to that found for spatial receptive field sizes should also exist for the temporal response characteristics of different brain regions.” This is exactly what they found, which enabled them to conclude that “similar to the known cortical hierarchy of spatial receptive fields, there is a hierarchy of progressively longer temporal receptive windows in the human brain.”9
The most powerful argument for the universality of processing in the neocortex is the pervasive evidence of plasticity (not just learning but interchangeability): In other words, one region is able to do the work of other regions, implying a common algorithm across the entire neocortex. A great deal of neuroscience research has been focused on identifying which regions of the neocortex are responsible for which types of patterns. The classical technique for determining this has been to take advantage of brain damage from injury or stroke and to correlate lost functionality with specific damaged regions. So, for example, when we notice that someone with newly acquired damage to the fusiform gyrus region suddenly has difficulty recognizing faces but is still able to identify people from their voices and language patterns, we can hypothesize that this region has something to do with face recognition. The underlying assumption has been that each of these regions is designed to recognize and process a particular type of pattern. Particular physical regions have become associated with particular types of patterns, because under normal circumstances that is how the information happens to flow. But when that normal flow of information is disrupted for any reason, another region of the neocortex is able to step in and take over.
Plasticity has been widely noted by neurologists, who observed that patients with brain damage from an injury or a stroke can relearn the same skills in another area of the neocortex. Perhaps the most dramatic example of plasticity is a 2011 study by American neuroscientist Marina Bedny and her colleagues on what happens to the visual cortex of congenitally blind people. The common wisdom has been that the early layers of the visual cortex, such as V1 and V2, inherently deal with very low-level patterns (such as edges and curves), whereas the frontal cortex (that evolutionarily new region of the cortex that we have in our uniquely large foreheads) inherently deals with the far more complex and subtle patterns of language and other abstract concepts. But as Bedny and her colleagues found, “Humans are thought to have evolved brain regions in the left frontal and temporal cortex that are uniquely capable of language processing. However, congenitally blind individuals also activate the visual cortex in some verbal tasks. We provide evidence that this visual cortex activity in fact reflects language processing. We find that in congenitally blind individuals, the left visual cortex behaves similarly to classic language regions…. We conclude that brain regions that are thought to have evolved for vision can take on language processing as a result of early experience.”10
Consider the implications of this study: It means that neocortical regions that are physically relatively far apart, and that have also been considered conceptually very different (primitive visual cues versus abstract language concepts), use essentially the same algorithm. The regions that process these disparate types of patterns can substitute for one another.
University of California at Berkeley neuroscientist Daniel E. Feldman wrote a comprehensive 2009 review of what he called “synaptic mechanisms for plasticity in the neocortex” and found evidence for this type of plasticity across the neocortex. He writes that “plasticity allows the brain to learn and remember patterns in the sensory world, to refine movements…and to recover function after injury.” He adds that this plasticity is enabled by “structural changes including formation, removal, and morphological remodeling of cortical synapses and dendritic spines.”11
Another startling example of neocortical plasticity (and therefore of the uniformity of the neocortical algorithm) was recently demonstrated by scientists at the University of California at Berkeley. They hooked up implanted microelectrode arrays to pick up brain signals specifically from a region of the motor cortex of mice that controls the movement of their whiskers. They set up their experiment so that the mice would get a reward if they controlled these neurons to fire in a certain mental pattern but not to actually move their whiskers. The pattern required to get the reward involved a mental task that their frontal neurons would normally not do. The mice were nonetheless able to perform this mental feat essentially by thinking with their motor neurons while mentally decoupling them from controlling motor movements.12 The conclusion is that the motor cortex, the region of the neocortex responsible for coordinating muscle movement, also uses the standard neocortical algorithm.
There are several reasons, however, why a skill or an area of knowledge that has been relearned using a new area of the neocortex to replace one that has been damaged will not necessarily be as good as the original. First, because it took an entire lifetime to learn and perfect a given skill, relearning it in another area of the neocortex will not immediately generate the same results. More important, that new area of the neocortex has not just been sitting around waiting as a standby for an injured region. It too has been carrying out vital functions, and will therefore be hesitant to give up its neocortical patterns to compensate for the damaged region. It can start by releasing some of the redundant copies of its patterns, but doing so will subtly degrade its existing skills and does not free up as much cortical space as the skills being relearned had used originally.
There is a third reason why plasticity has its limits. Since in most people particular types of patterns will flow through specific regions (such as faces being processed by the fusiform gyrus), these regions have become optimized (by biological evolution) for those types of patterns. As I report in chapter 7, we found the same result in our digital neocortical developments. We could recognize speech with our character recognition systems and vice versa, but the speech systems were optimized for speech and similarly the character recognition systems were optimized for printed characters, so there would be some reduction in performance if we substituted one for the other. We actually used evolutionary (genetic) algorithms to accomplish this optimization, a simulation of what biology does naturally. Given that faces have been flowing through the fusiform gyrus for most people for hundreds of thousands of years (or more), biological evolution has had time to evolve a favorable ability to process such patterns in that region. It uses the same basic algorithm, but it is oriented toward faces. As Dutch neuroscientist Randal Koene wrote, “The [neo]cortex is very uniform, each column or minicolumn can in principle do what each other one can do.”13
Substantial recent research supports the observation that the pattern recognition modules wire themselves based on the patterns to which they are exposed. For example, neuroscientist Yi Zuo and her colleagues watched as new “dendritic spines” formed connections between nerve cells as mice learned a new skill (reaching through a slot to grab a seed).14 Researchers at the Salk Institute have discovered that this critical self-wiring of the neocortex modules is apparently controlled by only a handful of genes. These genes and this method of self-wiring are also uniform across the neocortex.15
Many other studies document these attributes of the neocortex, but let’s summarize what we can observe from the neuroscience literature and from our own thought experiments. The basic unit of the neocortex is a module of neurons, which I estimate at around a hundred. These are woven together into each neocortical column so that each module is not visibly distinct. The pattern of connections and synaptic strengths within each module is relatively stable. It is the connections and synaptic strengths between modules that represent learning.
There are on the order of a quadrillion (1015) connections in the neocortex, yet only about 25 million bytes of design information in the genome (after lossless compression),16 so the connections themselves cannot possibly be predetermined genetically. It is possible that some of this learning is the product of the neocortex’s interrogating the old brain, but that still would necessarily represent only a relatively small amount of information. The connections between modules are created on the whole from experience (nurture rather than nature).
The brain does not have sufficient flexibility so that each neocortical pattern recognition module can simply link to any other module (as we can easily program in our computers or on the Web)—an actual physical connection must be made, composed of an axon connecting to a dendrite. We each start out with a vast stockpile of possible neural connections. As the Wedeen study shows, these connections are organized in a very repetitive and orderly manner. Terminal connection to these axons-in-waiting takes place based on the patterns that each neocortical pattern recognizer has recognized. Unused connections are ultimately pruned away. These connections are built hierarchically, reflecting the natural hierarchical order of reality. That is the key strength of the neocortex.
The basic algorithm of the neocortical pattern recognition modules is equivalent across the neocortex from “low-level” modules, which deal with the most basic sensory patterns, to “high-level” modules, which recognize the most abstract concepts. The vast evidence of plasticity and the interchangeability of neocortical regions is testament to this important observation. There is some optimization of regions that deal with particular types of patterns, but this is a second-order effect—the fundamental algorithm is universal.
Signals go up and down the conceptual hierarchy. A signal going up means, “I’ve detected a pattern.” A signal going down means, “I’m expecting your pattern to occur,” and is essentially a prediction. Both upward and downward signals can be either excitatory or inhibitory.
Each pattern is itself in a particular order and is not readily reversed. Even if a pattern appears to have multidimensional aspects, it is represented by a one-dimensional sequence of lower-level patterns. A pattern is an ordered sequence of other patterns, so each recognizer is inherently recursive. There can be many levels of hierarchy.
There is a great deal of redundancy in the patterns we learn, especially the important ones. The recognition of patterns (such as common objects and faces) uses the same mechanism as our memories, which are just patterns we have learned. They are also stored as sequences of patterns—they are basically stories. That mechanism is also used for learning and carrying out physical movement in the world. The redundancy of patterns is what enables us to recognize objects, people, and ideas even when they have variations and occur in different contexts. The size and size variability parameters also allow the neocortex to encode variation in magnitude against different dimensions (duration in the case of sound). One way that these magnitude parameters could be encoded is simply through multiple patterns with different numbers of repeated inputs. So, for example, there could be patterns for the spoken word “steep” with different numbers of the long vowel [E] repeated, each with the importance parameter set to a moderate level indicating that the repetition of [E] is variable. This approach is not mathematically equivalent to having the explicit size parameters and does not work nearly as well in practice, but is one approach to encoding magnitude. The strongest evidence we have for these parameters is that they are needed in our AI systems to get accuracy levels that are near human levels.
The summary above constitutes the conclusions we can draw from the sampling of research results I have shared above as well as the sampling of thought experiments I discussed earlier. I maintain that the model I have presented is the only possible model that satisfies all of the constraints that the research and our thought experiments have established.
Finally, there is one more piece of corroborating evidence. The techniques that we have evolved over the past several decades in the field of artificial intelligence to recognize and intelligently process real-world phenomena (such as human speech and written language) and to understand natural-language documents turn out to be mathematically similar to the model I have presented above. They are also examples of the PRTM. The AI field was not explicitly trying to copy the brain, but it nonetheless arrived at essentially equivalent techniques.