CHAPTER 9
Pattern and Provability
Principia Mathematica and its Theorems
IN THE early twentieth century, Bertrand Russell, spurred by the maxim “Find and study paradoxes; design and build great ramparts to keep them out!” (my words, not his), resolved that in Principia Mathematica, his new barricaded fortress of mathematical reasoning, no set could ever contain itself, and no sentence could ever turn around and talk about itself. These parallel bans were intended to save Principia Mathematica from the trap that more naïve theories had fallen into. But something truly strange turned up when Kurt Gödel looked closely at what I will call PM — that is, the formal system used in Principia Mathematica for reasoning about sets (and about numbers, but they came later, as they were defined in terms of sets).
Let me be a little more explicit about this distinction between Principia Mathematica and PM. The former is a set of three hefty tomes, whereas PM is a set of precise symbol-manipulation rules laid out and explored in depth in those tomes, using a rather arcane notation (see the end of this chapter). The distinction is analogous to that between Isaac Newton’s massive tome entitled Principia and the laws of mechanics that he set forth therein.
Although it took many chapters of theorems and derivations before the rather lowly fact that one plus one equals two (written in PM notation as “s0 + s0 = ss0”, where the letter “s” stands for the concept “successor of ”) was rigorously demonstrated using the strict symbol-shunting rules of PM, Gödel nonetheless realized that PM, though terribly cumbersome, had enormous power to talk about whole numbers — in fact, to talk about arbitrarily subtle properties of whole numbers. (By the way, that little phrase “arbitrarily subtle properties” already gives the game away, though the hint is so veiled that almost no one is aware of how much the words imply. It took Gödel to fully see it.)
For instance, as soon as enough set-theoretical machinery had been introduced in Principia Mathematica to allow basic arithmetical notions like addition and multiplication to enter the picture, it became easy to define, within the PM formalism, more interesting notions such as “square” (i.e., the square of a whole number), “nonsquare”, “prime”, and “composite”.
There could thus be, at least in theory, a volume of Principia Mathematica devoted entirely to exploring the question of which integers are, and which are not, the sum of two squares. For instance, 41 is the sum of 16 and 25, and there are infinitely many other integers that can be made by summing two squares. Call them members of Class A. On the other hand, 43 is not the sum of any pair of squares, and likewise, there are infinitely many other integers that cannot be made by summing two squares. Call them members of Class B. (Which class is 109 in? What about 133?) Fully fathoming this elegant dichotomy of the set of all integers, though a most subtle task, had been accomplished by number theorists long before Gödel’s birth.
Analogously, one could imagine another volume of Principia Mathematica devoted entirely to exploring the question of which integers are, and which are not, the sum of two primes. For instance, 24 is the sum of 5 and 19, whereas 23 is not the sum of any pair of primes. Once again, we can call these two classes of integers “Class C” and “Class D”, respectively. Each class has infinitely many members. Fully fathoming this elegant dichotomy of the set of all integers represents a very deep and, as of today, still unsolved challenge for number theorists, though much progress has been made in the two-plus centuries since the problem was first posed.
Mixing Two Unlikely Ideas: Primes and Squares
Before we look into Gödel’s unexpected twist-based insight into PM, I need to comment first on the profound joy in discovering patterns, and next on the profound joy in understanding what lies behind patterns. It is mathematicians’ relentless search for why that in the end defines the nature of their discipline. One of my favorite facts in number theory will, I hope, allow me to illustrate this in a pleasing fashion.
Let us ask ourselves a simple enough question concerning prime numbers: Which primes are sums of two squares (41, for example), and which primes are not (43, for example)? In other words, let’s go back to Classes A and B, both of which are infinite, and ask which prime numbers lie in each of them. Is it possible that nearly all prime numbers are in one of these classes, and just a few in the other? Or is it about fifty–fifty? Are there infinitely many primes in each class? Given an arbitrary prime number p, is there a quick and simple test to determine which class p belongs to (without trying out all possible additions of two squares smaller than p)? Is there any kind of predictable pattern concerning how primes are distributed in these two classes, or is it just a jumbly chaos?
To some readers, these may seem like peculiar or even unnatural questions to tackle, but mathematicians are constitutionally very curious people, and it happens that they are often deeply attracted by the idea of exploring interactions between concepts that do not, a priori, seem related at all (such as the primes and the squares). What often happens is that some kind of unexpected yet intimate connection turns up — some kind of crazy hidden regularity that feels magical, the discovery or the revelation of which may even send mystical frissons up and down one’s spine. I, for one, shamelessly admit to being highly susceptible to such spine-tingling mixtures of awe, beauty, mystery, and surprise.
To get a feel for this kind of thing, let us take the list of all the primes up to 100 — 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 — a rather jumbly, chaotic list, by the way — and redisplay it, highlighting those primes that are sums of two squares (that is, Class A primes), and leaving untouched those that are not (Class B primes). Here is what we get:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ,…
Do you see anything interesting going on here? Well, for one thing, isn’t it already quite a surprise that it seems to be a fairly even competition? Why should that be the case? Why shouldn’t either Class A or Class B be dominant? Will either the Class A primes or the Class B primes take over after a while, or will their roughly even balance continue forever? As we go out further and further towards infinity, will the balance tend closer and closer to being exactly fifty–fifty? If so, why would such an amazing, delicate balance hold? To me, there is something enormously alluring here, and so I encourage you to look at this display for a little while — a few minutes, say — and try to find any patterns in it, before going on.
Pattern-hunting
All right, reader, here we are, back together again, hopefully after a bit of pattern-searching on your part. Most likely you noticed that our act of highlighting seems, not by intention but by chance (or is it chance?), to have broken the list into singletons and pairs. A hidden connection revealed?
Let’s look into this some more. The boldface pairs are 13–17, 37–41, and 89–97, while the non-boldface pairs are 7–11, 19–23, 43–47, 67–71, and 79–83. Suppose, then, that we replace each pair by the letter “P” and each singleton by the letter “S”, retaining the highlighting that distinguishes Class A from Class B. We then get the following sequence of letters:
S, S, S, P, P, P, S, S, P, P, S, S, S, P, S, P, P, . . .
Is there some kind of pattern here, or is there none? What do you think? If we pull out just the Class-A letters, we get this: SSPSPSSSP; and if we pull out just the Class-B letters, we get this: SPPSPSPP. If there is any kind of periodicity or subtler type of rhythmicity here, it’s certainly elusive. No simple predictable pattern jumps out either in boldface or in non-boldface, nor did any jump out when they were mixed together. We have picked up a hint of a quite even balance between the two classes, yet we lack any hints as to why that might be. This is provocative but frustrating.
People who Pursue Patterns with Perseverance
At this juncture, I feel compelled to point out a distinction not between two classes of numbers, but between two classes of people. There are those who will immediately be drawn to the idea of pattern-seeking, and there are those who will find it of no appeal, perhaps even distasteful. The former are, in essence, those who are mathematically inclined, and the latter are those who are not. Mathematicians are people who at their deepest core are drawn on — indeed, are easily seduced — by the urge to find patterns where initially there would seem to be none. The passionate quest after order in an apparent disorder is what lights their fires and fires their souls. I hope you are among this class of people, dear reader, but even if you are not, please do bear with me for a moment.
It may seem that we have already divined a pattern of sorts — namely, that we will forever encounter just singletons and pairs. Even if we can’t quite say how the S’s and P’s will be interspersed, it appears at least that the imposition of the curious dichotomy “sums-of-two-squares vs. not-sums-of-two-squares” onto the sequence of the prime numbers breaks it up into singletons and pairs, which is already quite a fantastic discovery! Who would have guessed?
Unfortunately, I must now confess that I have misled you. If we simply throw the very next prime, which is 101, into our list, it sabotages the seeming order we’ve found. After all, the prime number 101, being the sum of the two squares 1 and 100, and thus belonging to Class A, has to be written in boldface, and so our alleged boldface pair 89–97 turns out to be a boldface triplet instead. And thus our hopeful notion of a sequence of just S’s and P’s goes down the drain.
What does a pattern-seeker do at this point — give up? Of course not! After a setback, a flexible pattern-seeker merely regroups. Indeed, taking our cue from the word just given, let us try regrouping our sequence of primes in a different fashion. Suppose we segregate the two classes, displaying them on separate lines. This will give us the following:
Yes square + square: 2, 5, 13, 17, 29, 37, 41, 53, 61, 73, 89, 97, 101,…
No square + square: 3, 7, 11, 19, 23, 31, 43, 47, 59, 67, 71, 79, 83,…
Do you see anything yet? If not, let me give you a hint. What if you simply take the differences between adjacent numbers in each line? Try it yourself — or else, if you’re very lazy, then just read on.
In the upper line, you will get 3, 8, 4, 12, 8, 4, 12, 8, 12, 16, 8, 4, whereas in the lower line you will get 4, 4, 8, 4, 8, 12, 4, 12, 8, 4, 8, 4. There is something that surely should jump out at even the most indifferent reader at this point: not only is there a preponderance of just a few integers (4, 8, and 12), but moreover, all these integers are multiples of 4. This seems too much to be merely coincidental.
And the only larger number in these two lists — 16 — is also a multiple of 4. Will this new pattern — multiples of 4 exclusively — hold up forever? (Of course, there is that party-pooper of a ‘3’ at the very outset, but we can chalk it up to the fact that 2 is the only even prime. No big deal.)
Where There’s Pattern, There’s Reason
The key thought in the preceding few lines is the article of faith that this pattern cannot merely be a coincidence. A mathematician who finds a pattern of this sort will instinctively ask, “Why? What is the reason behind this order?” Not only will all mathematicians wonder what the reason is, but even more importantly, they will all implicitly believe that whether or not anyone ever finds the reason, there must be a reason for it. Nothing happens “by accident” in the world of mathematics. The existence of a perfect pattern, a regularity that goes on forever, reveals — just as smoke reveals a fire — that something is going on behind the scenes. Mathematicians consider it a sacred goal to seek that thing, uncover it, and bring it out into the open.
This activity is called, as you well know, “finding a proof ”, or stated otherwise, turning a conjecture into a theorem. The late great eccentric Hungarian mathematician Paul Erdös once made the droll remark that “a mathematician is a device for turning coffee into theorems”, and although there is surely truth in his witticism, it would be more accurate to say that mathematicians are devices for finding conjectures and turning them into theorems.
What underlies the mathematical mindset is an unshakable belief that whenever some mathematical statement X is true, then X has a proof, and vice versa. Indeed, to the mathematical mind, “having a proof ” is no more and no less than what “being true” means! Symmetrically, “being false” means “having no proof ”. One can find hints of a perfect, infinite pattern by doing numerical explorations, as we did above, but how can one know for sure that a suspected regularity will continue forever, without end? How can one know, for instance, that there are infinitely many prime numbers? How do we know there will not, at some point, be a last one — the Great Last Prime P?
If it existed, P would be a truly important and interesting number, but if you look at a long list of consecutive primes (the list above of primes up to 100 gives the flavor), you will see that although their rhythm is a bit “bumpy”, with odd gaps here and there, the interprime gaps are always quite small compared to the size of the primes involved. Given this very clear trend, if the primes were to run out all of a sudden, it would almost feel like falling off the edge of the Earth without any warning. It would be a huge shock. Still, how do we know this won’t happen? Or do we know it? Finding, with the help of a computer, that new primes keep on showing up way out into the billions and the trillions is great, but it won’t guarantee in rock-solid fashion that they won’t just stop all of a sudden somewhere out further. We have to rely on reasoning to get us there, because although finite amounts of evidence can be strongly suggestive, they just don’t cut the mustard, because infinity is very different from any finite number.
Sailing the Ocean of Primes and Falling off the Edge
You probably have seen Euclid’s proof of the infinitude of the primes somewhere, but if not, you have missed out on one of the most crucial pillars of human knowledge that ever have been found. It would be a gap in your experience of life as sad as never having tasted chocolate or never having heard a piece of music. I can’t tolerate such crucial gaps in my readers’ knowledge, so here goes nothing!
Let’s suppose that P, the Great Last Prime in the Sky, does exist, and see what that supposition leads to. For P to exist means that there is a Finite, Closed Club of All Primes, of which P itself is the glorious, crowning, final member. Well then, let’s boldly multiply all the primes in the Closed Club together to make a delightfully huge number called Q. This number Q is thus divisible by 2 and also by 3, 5, 7, 11, and so forth. By its definition, Q is divisible by every prime in the Club, which means by every prime in the universe! And now, for a joyous last touch, as in birthday parties, let’s add one candle to grow on, to make Q + 1. So here’s a colossal number that, we are assured, is not prime, since P (which is obviously dwarfed by Q) is the Great Last Prime, the biggest prime of all. All numbers beyond P are, by our initial supposition, composite. Therefore Q + 1, being way beyond P and hence composite, has to have some prime divisor. (Remember this, please.)
What could that unknown prime divisor be? It can’t be 2, because 2 divides Q itself, which is just one step below Q + 1, and two even numbers are never located at a distance of 1 from each other. It also can’t be 3, because 3 likewise divides Q itself, and numbers divisible by 3 are never next-door neighbors! In fact, whatever prime p that we select from the Club, we find that p can’t divide Q + 1, because p divides its lower neighbor Q (and multiples of p are never next-door neighbors — they come along only once every p numbers). And so reasoning has shown us that none of the members of the Finite, Closed Club of Primes divides Q + 1.
But above, I observed (and I asked you to remember) that Q + 1, being composite, has to have a prime divisor. Sting! We have been caught in a trap, painted ourselves into a corner. We have concocted a crazy number — a number that on the one hand must be composite (i.e., has some smaller prime divisor) and yet on the other hand has no smaller prime divisor. This contradiction came out of our assumption that there was a Finite, Closed Club of Primes, gloriously crowned by P, and so we have no choice but to go back and erase that whole amusing, suspect vision.
There cannot be a “Great Last Prime in the Sky”; there cannot be a “Finite, Closed Club of All Primes”. These are fictions. The truth, as we have just demonstrated, is that the list of primes goes on without end. We will never, ever “fall off the Earth”, no matter how far out we go. Of that we now are assured by flawless reasoning, in a way that no finite amount of computational sailing among seas of numbers could ever have assured us.
If, perchance, coming to understand why there is no last prime (as opposed to merely knowing that it is the case) was a new experience to you, I hope you savored it as much as a piece of chocolate or of music. And just like such experiences, following this proof is a source of pleasure that one can come back to and dip into many times, finding it refreshing each new time. Moreover, this proof is a rich source of other proofs — Variations on a Theme by Euclid (though we will not explore them here).
The Mathematician’s Credo
We have just seen up close a lovely example of what I call the “Mathematician’s Credo”, which I will summarize as follows:
X is true because there is a proof of X;
X is true and so there is a proof of X.
Notice that this is a two-way street. The first half of the Credo asserts that proofs are guarantors of truth, and the second half asserts that where there is a regularity, there is a reason. Of course we ourselves may not uncover the hidden reason, but we firmly and unquestioningly believe that it exists and in principle could someday be found by someone.
To doubt either half of the Credo would be unthinkable to a mathematician. To doubt the first line would be to imagine that a proved statement could nonetheless be false, which would make a mockery of the notion of “proof ”, while to doubt the second line would be to imagine that within mathematics there could be perfect, exceptionless patterns that go on forever, yet that do so with no rhyme or reason. To mathematicians, this idea of flawless but reasonless structure makes no sense at all. In that regard, mathematicians are all cousins of Albert Einstein, who famously declared, “God does not play dice.” What Einstein meant is that nothing in nature happens without a cause, and for mathematicians, that there is always one unifying, underlying cause is an unshakable article of faith.
No Such Thing as an Infinite Coincidence
We now return to Class A versus Class B primes, because we had not quite reached our revelation, had not yet experienced that mystical frisson I spoke of. To refresh your memory, we had noticed that each line was characterized by differences of the form 4n — that is, 4, 8, 12, and so forth. We didn’t prove this fact, but we observed it often enough that we conjectured it.
The lower line in our display starts out with 3, so our conjecture would imply that all the other numbers in that line are gotten by adding various multiples of 4 to 3, and consequently, that every number in that line is of the form 4n + 3. Likewise (if we ignore the initial misfit of 2), the first number in the upper line is 5, so if our conjecture is true, then every subsequent number in that line is of the form 4n + 1.
Well, well — our conjecture has suggested a remarkably simple pattern to us: Primes of the form 4n + 1 can be represented as sums of two squares, while primes of the form 4n + 3 cannot. If this guess is correct, it establishes a beautiful, spectacular link between primes and squares (two classes of numbers that a priori would seem to have nothing to do with each other), one that catches us completely off guard. This is a glimpse of pure magic — the kind of magic that mathematicians live for.
And yet for a mathematician, this flash of joy is only the beginning of the story. It is like a murder mystery: we have found out someone is dead, but whodunnit? There always has to be an explanation. It may not be easy to find or easy to understand, but it has to exist.
Here, we know (or at least we strongly suspect) that there is a beautiful infinite pattern, but for what reason? The bedrock assumption is that there is a reason here — that our pattern, far from being an “infinite coincidence”, comes from one single compelling, underlying reason; that behind all these infinitely many “independent” facts lies just one phenomenon.
As it happens, there is actually much more to the pattern we have glimpsed. Not only are primes of the form 4n + 3 never the sum of two squares (proving this is easy), but also it turns out that every prime number of the form 4n + 1 has one and only one way of being the sum of two squares. Take 101, for example. Not only does 101 equal 100 + 1, but there is no other sum of two squares that yields 101. Finally, it turns out that in the limit, as one goes further and further out, the ratio of the number of Class A primes to the number of Class B primes grows ever closer to 1. This means that the delicate balance that we observed in the primes below 100 and conjectured would continue ad infinitum is rigorously provable.
Although I will not go further into this particular case study, I will state that many textbooks of number theory prove this theorem (it is far from trivial), thus supplementing a pattern with a proof. As I said earlier, X is true because X has a proof, and conversely, X is true and so X has a proof.
The Long Search for Proofs, and for their Nature
I mentioned above that the question “Which numbers are sums of two primes?”, posed almost 300 years ago, has never been fully solved. Mathematicians are dogged searchers, however, and their search for a proof may go on for centuries, even millennia. They are not discouraged by eons of failure to find a proof of a mathematical pattern that, from numerical trends, seems likely to go on and on forever. Indeed, extensive empirical confirmation of a mathematical conjecture, which would satisfy most people, only makes mathematicians more ardent and more frustrated. They want a proof as good as Euclid’s, not just lots of spot checks! And they are driven by their belief that a proof has to exist — in other words, that if no proof existed, then the pattern in question would have to be false.
This, then, constitutes the flip side of the Mathematician’s Credo:
X is false because there is no proof of X;
X is false and so there is no proof of X.
In a word, just as provability and truth are the same thing for a mathematician, so are nonprovability and falsity. They are synonymous.
During the centuries following the Renaissance, mathematics branched out into many subdisciplines, and proofs of many sorts were found in all the different branches. Once in a while, however, results that were clearly absurd seemed to have been rigorously proven, yet no one could pinpoint where things had gone awry. As stranger and stranger results turned up, the uncertainty about the nature of proofs became increasingly disquieting, until finally, in the middle of the nineteenth century, a powerful movement arose whose goal was to specify just what reasoning really was, and to bond it forever with mathematics, fusing the two into one.
Many philosophers and mathematicians contributed to this noble goal, and around the turn of the twentieth century it appeared that the goal was coming into sight. Mathematical reasoning seemed to have been precisely characterized as the repeated use of certain basic rules of logic, dubbed rules of inference, such as modus ponens: If you have proven a result X and you have also proven X ⇒ Y (where the arrow represents the concept of implication, so that the line means “If X is true, then Y is also true”), then you can toss Y into the bin of proven results. There were a few other fundamental rules of inference, but it was agreed that not very many were needed. About a decade into the twentieth century, Bertrand Russell and Alfred North Whitehead codified these rules in a uniform if rather prickly notation (see facing page), thus apparently allowing all the different branches of mathematics to be folded in with logic, making a seamless, perfect unity.
Thanks to Russell and Whitehead’s grand work, Principia Mathematica, people no longer needed to fear falling into hidden crevasses of false reasoning. Theorems were now understood as simply being the bottom lines of sequences of symbol-manipulations whose top lines were either axioms or earlier theorems. Mathematical truth was all coming together so elegantly. And as this Holy Grail was emerging into clear view, a young boy was growing up in the town of Brünn, Austro-Hungary.