Code 2.0 - читать бесплатно онлайн полную версию книги автора Lawrence Lessig (Part One. “Regulability”) #5

Part One. “Regulability”

It is said that cyberspace can’t be regulated. But what does it mean to say that something could be regulated? What makes regulation possible? That’s the question raised in this Part. If the Internet can’t be regulated, why? And whatever the reason, can it change? Might an unregulable space be tamed? Might the Wild West be won, and how?

Chapter 3. Is-ism

is the way it is the way it must be?

The rise of an electronic medium that disregards geographical boundaries throws the law into disarray by creating entirely new phenomena that need to become the subject of clear legal rules but that cannot be governed, satisfactorily, by any current territorially based sovereign.

David Johnson and David Post[1]

Some things never change about governing the Web. Most prominent is its innate ability to resist governance in almost any form.

Tom Steinert-Threlkeld[2]

If there was a meme that ruled talk about cyberspace, it was that cyberspace was a place that could not be regulated. That it “cannot be governed”; that its “nature” is to resist regulation. Not that cyberspace cannot be broken, or that government cannot shut it down. But if cyberspace exists, so first-generation thinking goes, government’s power over behavior there is quite limited. In its essence, cyberspace is a space of no control.

Nature. Essence. Innate. The way things are. This kind of rhetoric should raise suspicions in any context. It should especially raise suspicion here. If there is any place where nature has no rule, it is in cyberspace. If there is any place that is constructed, cyberspace is it. Yet the rhetoric of “essence” hides this constructedness. It misleads our intuitions in dangerous ways.

This is the fallacy of “is-ism” — the mistake of confusing how something is with how it must be. There is certainly a way that cyberspace is. But how cyberspace is is not how cyberspace has to be. There is no single way that the Net has to be; no single architecture that defines the nature of the Net. The possible architectures of something that we would call “the Net” are many, and the character of life within those different architectures is diverse.

That most of us commit this fallacy is not surprising. Most of us haven’t a clue about how networks work. We therefore have no clue about how they could be different. We assume that the way we find things is the way things have to be. We are not trained to think about all the different ways technology could achieve the same ends through different means. That sort of training is what technologists get. Most of us are not technologists.

But underlying everything in this book is a single normative plea: that all of us must learn at least enough to see that technology is plastic. It can be remade to do things differently. And that if there is a mistake that we who know too little about technology should make, it is the mistake of imagining technology to be too plastic, rather than not plastic enough. We should expect — and demand — that it can be made to reflect any set of values that we think important. The burden should be on the technologists to show us why that demand can’t be met.

The particular is-ism that I begin with here is the claim that cyberspace can’t be regulated. As this, and the following chapters argue, that view is wrong. Whether cyberspace can be regulated depends upon its architecture. The original architecture of the Internet made regulation extremely difficult. But that original architecture can change. And there is all the evidence in the world that it is changing. Indeed, under the architecture that I believe will emerge, cyberspace will be the most regulable space humans have ever known. The “nature” of the Net might once have been its unregulability; that “nature” is about to flip.

To see the flip, you must first see a contrast between two different cyber-places. These two cyber-places are ideal types, and, indeed, one of the two ideals no longer exists anywhere on the Net. That fact is confirmation of the point this section aims to make: that we’re moving from one Internet to another, and the one we’re moving to will be significantly more regulable.

The following descriptions are not technical; I don’t offer them as complete definitions of types of networks or types of control. I offer them to illustrate — to sketch enough to see a far more general point.

Cyber-places: Harvard Versus Chicago

The Internet was born at universities in the United States. Its first subscribers were researchers. But as a form of life, its birth was tied to university life. It swept students online, pulling them away from life in real space. The Net was one of many intoxicants on college campuses in the mid-1990s, and its significance only grew through time. As former New York Times columnist J. C. Herz wrote in her first book about cyberspace:

When I look up, it’s four-thirty in the morning. “No way.” I look from the clock to my watch. Way. I’ve been in front of this screen for six hours, and it seems like no time at all. I’m not even remotely tired. Dazed and thirsty, but not tired. In fact, I’m euphoric. I stuff a disheveled heap of textbooks, photocopied articles, highlighters and notes into my backpack and run like a madwoman up the concrete steps, past the security guard, and outside into the predawn mist. . . .

I stop where a wet walkway meets a dry one and stand for a sec. . . . I start thinking about this thing that buzzes around the entire world, through the phone lines, all day and all night long. It’s right under our noses and it’s invisible. It’s like Narnia, or Magritte, or Star Trek, an entire goddamned world. Except it doesn’t physically exist. It’s just the collective consciousness of however many people are on it.

This really is outstandingly weird.[3]

Yet not all universities adopted the Net in the same way. Or put differently, the access universities granted was not all the same. The rules were different. The freedoms allowed were different. One example of this difference comes from two places I knew quite well, though many other examples could make the same point.

In the middle 1990s at the University of Chicago, if you wanted access to the Internet, you simply connected your machine to Ethernet jacks located throughout the university.[4] Any machine with an Ethernet connection could be plugged into these jacks. Once connected, your machine had full access to the Internet — access, that is, that was complete, anonymous, and free.

The reason for this freedom was a decision by an administrator — the then-Provost, Geoffrey Stone, a former dean of the law school and a prominent free speech scholar. When the university was designing its net, the technicians asked Stone whether anonymous communication should be permitted. Stone, citing the principle that the rules regulating speech at the university should be as protective of free speech as the First Amendment, said yes: People should have the right to communicate at the university anonymously, because the First Amendment to the Constitution guarantees the same right vis-à-vis governments. From that policy decision flowed the architecture of the University of Chicago’s net.

At Harvard, the rules are different. If you plug your machine into an Ethernet jack at the Harvard Law School, you will not gain access to the Net. You cannot connect your machine to the Net at Harvard unless the machine is registered — licensed, approved, verified. Only members of the university community can register their machines. Once registered, all interactions with the network are monitored and identified to a particular machine. To join the network, users have to “sign” a user agreement. The agreement acknowledges this pervasive practice of monitoring. Anonymous speech on this network is not permitted — it is against the rules. Access can be controlled based on who you are, and interactions can be traced based on what you did.

This design also arose from the decision of an administrator, one less focused on the protections of the First Amendment. Control was the ideal at Harvard; access was the ideal at Chicago. Harvard chose technologies that made control possible; Chicago chose technologies that made access easy.

These two networks differ in at least two important ways. First and most obviously, they differ in the values they embrace.[5] That difference is by design. At the University of Chicago, First Amendment values determined network design; different values determined Harvard’s design.

But they differ in a second way as well. Because access is controlled at Harvard and identity is known, actions can be traced back to their root in the network. Because access is not controlled at Chicago, and identity is not known, actions cannot be traced back to their root in the network. Monitoring or tracking behavior at Chicago is harder than it is at Harvard. Behavior in the Harvard network is more controllable than in the University of Chicago network.

The networks thus differ in the extent to which they make behavior within each network regulable. This difference is simply a matter of code — a difference in the software and hardware that grants users access. Different code makes differently regulable networks. Regulability is thus a function of design.

These two networks are just two points on a spectrum of possible network designs. At one extreme we might place the Internet — a network defined by a suite of protocols that are open and nonproprietary and that require no personal identification to be accessed and used. At the other extreme are traditional closed, proprietary networks, which grant access only to those with express authorization; control, therefore, is tight. In between are networks that mix elements of both. These mixed networks add a layer of control to the otherwise uncontrolled Internet. They layer elements of control on top.

Thus the original — there have been some changes in the last years[6] — University of Chicago network was close to the norm for Internet access in the middle of the 1990s.[7] Let’s call it Net95. At the other extreme are closed networks that both predate the Internet and still exist today — for example, the ATM network, which makes it possible to get cash from your California bank at 2:00 a.m. while in Tblisi. And in the middle are Harvard-type networks — networks that add a layer of control on top of the suite of protocols that define “the Internet.” These protocols are called “TCP/IP.” I describe them more extensively in Chapter 4. But the essential feature of the Harvard network is that this suite was supplemented. You get access to the Internet only after you’ve passed through this layer of control.

All three designs are communication networks that are “like” the Internet. But their differences raise an obvious question: When people say that the Internet is “unregulable”, which network are they describing? And if they’re talking about an unregulable network, why is it unregulable? What features in its design make it unregulable? And could those features be different?

Consider three aspects of Net95’s design that make it hard for a regulator to control behavior there. From the perspective of an anonymity-loving user, these are “features” of Net95 — aspects that make that network more valuable. But from the perspective of the regulator, these features are “bugs” — imperfections that limit the data that the Net collects, either about the user or about the material he or she is using.

The first imperfection is information about users — who the someone is who is using the Internet. In the words of the famous New Yorker cartoon of two dogs sitting in front of a PC, “On the Internet, nobody knows you’re a dog.[8]” No one knows, because the Internet protocols don’t require that you credential who you are before you use the Internet. Again, the Internet protocol doesn’t require that credential; your local access point, like the Harvard network, might. But even then, the information that ties the individual to a certain network transaction is held by the access provider. It is not a part of your Internet transaction.

The second “imperfection” is information about geography — where the someone is who is using the Internet. As I will describe more in Chapter 4, although the Internet is constituted by addresses, those addresses were initially simply logical addresses. They didn’t map to any particular location in the physical world. Thus, when I receive a packet of data sent by you through the Internet, it is certainly possible for me to know the Internet address from which your packet comes, but I will not know the physical address.

And finally, the third “imperfection” is information about use — what is the data being sent across this network; what is its use? The Internet does not require any particular labeling system for data being sent across the Internet. Again, as we’ll see in more detail below, there are norms that say something, but no rule to assure data gets distributed just according to the norms. Nothing puts the bits into a context of meaning, at least not in a way that a machine can use. Net95 had no requirement that data be labeled. “Packets” of data are labeled, in the sense of having an address. But beyond that, the packets could contain anything at all.

These three “imperfections” tie together: Because there is no simple way to know who someone is, where they come from, and what they’re doing, there is no simple way to regulate how people behave on the Net. If you can’t discover who did what and where, you can’t easily impose rules that say “don’t do this, or at least, don’t do it there.” Put differently, what you can’t know determines what you can control.

Consider an example to make the point clearer. Let’s say the state of Pennsylvania wants to block kids from porn. It thus passes a rule that says “No kid in Pennsylvania can get access to porn.” To enforce that rule, Pennsylvania has got to know (1) whether someone is a kid, (2) where they come from (i.e., Pennsylvania or Maine), and (3) what they’re looking at (porn or marzipan). Net95, however, won’t be of much help to Pennsylvania as it tries to enforce this rule. People accessing content in Pennsylvania using Net95 need not reveal anything about who they are or where they come from, and nothing in the design of Net95 requires sites to describe what content they carry. These gaps in data make regulating hard. Thus from the perspective of the regulator, these are imperfections in the Net’s original design.

But the Harvard network suggests that it is at least possible for the “bugs” in Net95 to be eliminated. The Net could know the credentials of the user (identity and location) and the nature of the data being sent. That knowledge could be layered onto the Internet without destroying its functionality. The choice, in other words, is not between the Internet and no Internet, or between the Internet and a closed proprietary network. Harvard suggests a middle way. Architectures of control could be layered on top of the Net to “correct” or eliminate “imperfections.” And these architectures could, in other words, facilitate control.[9]

That is the first, very small, claim of this early chapter in a story about emerging control: Architectures of control are possible; they could be added to the Internet that we already know. If they were added, that would radically change the character of the network. Whether these architectures should be added depends upon what we want to use the network for.

I say this is a small claim because, while it is important, it is the sort of point that one recognizes as obvious even if one didn’t see it originally. More than obvious, the point should be pedestrian. We see it in lots of contexts. Think, for example, of the post office. When I was growing up, the Post Office was a haven for anonymous speech. The job of the Post Office was simply to deliver packages. Like Net95, it didn’t worry about who a piece of mail was from, or what was in the envelope or package. There was no enforced requirement that you register before you send a letter. There was no enforced requirement that the letter have a return address or that the return address be correct. If you were careful to avoid fingerprints, you could use this government subsidized facility to send perfectly anonymous messages.

Obviously, the Post Office could be architected differently. The service could require, for example, a return address. It could require that you verify that the return address was correct (for example, by checking your ID before it accepted a package). It could even require inspection before it shipped a particular package or envelope. All of these changes in the procedures for the post would produce a world in which mail was more easily monitored and tracked. The government makes that choice when it designs the Post Office as it does. If monitoring becomes important, the government can change the system to facilitate it. If not, they can leave the postal system as it (largely) is. But if it does change the system to make monitoring more simple, that will reflect changes in values that inform the design of that network.

The claim of this book is that there are sufficient interests to move the Net95 from a default of anonymity to a default of identification. But nothing I’ve said yet shows how. What would get us from the relatively unregulable libertarian Net to a highly regulable Net of control?

This is the question for the balance of Part I. I move in two steps. In Chapter 4, my claim is that even without the government’s help, we will see the Net move to an architecture of control. In Chapter 5, I sketch how government might help. The trends promise a highly regulable Net — not the libertarian’s utopia, not the Net your father (or more likely your daughter or son) knew, but a Net whose essence is the character of control.

An Internet, in other words, that flips the Internet as it was.

Chapter 4. Architectures Of Control

The Invisible Man doesn’t fear the state. He knows his nature puts him beyond its reach (unless he gets stupid, and of course, he always gets stupid). His story is the key to a general lesson: If you can’t know who someone is, or where he is, or what he’s doing, you can’t regulate him. His behavior is as he wants it to be. There’s little the state can do to change it.

So too with the original Internet: Everyone was an invisible man. As cyberspace was originally architected, there was no simple way to know who someone was, where he was, or what he was doing. As the Internet was originally architected, then, there was no simple way to regulate behavior there.

The aim of the last chapter, however, was to add a small but important point to this obvious idea: Whatever cyberspace was, there’s no reason it has to stay this way. The “nature” of the Internet is not God’s will. Its nature is simply the product of its design. That design could be different. The Net could be designed to reveal who someone is, where they are, and what they’re doing. And if it were so designed, then the Net could become, as I will argue throughout this part, the most regulable space that man has ever known.

In this chapter, I describe the changes that could — and are — pushing the Net from the unregulable space it was, to the perfectly regulable space it could be. These changes are not being architected by government. They are instead being demanded by users and deployed by commerce. They are not the product of some 1984-inspired conspiracy; they are the consequence of changes made for purely pragmatic, commercial ends.

This obviously doesn’t make these changes bad or good. My purpose just now is not normative, but descriptive. We should understand where we are going, and why, before we ask whether this is where, or who, we want to be.

The history of the future of the Internet was written in Germany in January 1995. German law regulated porn. In Bavaria, it regulated porn heavily. CompuServe made (a moderate amount of, through USENET,) porn available to its users. CompuServe was serving Bavaria’s citizens. Bavaria told CompuServe to remove the porn from its servers, or its executives would be punished.

CompuServe at first objected that there was nothing it could do — save removing the porn from every server, everywhere in the world. That didn’t trouble the Germans much, but it did trouble CompuServe. So in January 1995, CompuServe announced a technical fix: Rather than blocking access to the USENET newsgroups that the Bavarians had complained about for all members of CompuServe, CompuServe had devised a technology to filter content on a country-by-country basis.[1]

To make that fix work, CompuServe had to begin to reckon who a user was, what they were doing, and where they were doing it. Technology could give them access to the data that needed reckoning. And with that shift, the future was set. An obvious response to a problem of regulability would begin to repeat itself.

CompuServe, of course, was not the Internet. But its response suggests the pattern that the Internet will follow. In this Chapter, I map just how the Internet can effectively be made to run (in this respect at least) like CompuServe.

Who did What, Where?

To regulate, the state needs a way to know the who, in “Who did what, where?” To see how the Net will show the state “who”, we need to think a bit more carefully about how “identification” works in general, and how it might work on the Internet.

Identity and Authentication: Real Space

To make sense of the technologies we use to identify who someone is, consider the relationship among three familiar ideas — (1) “identity”, (2) “authentication”, and (3) “credential.”

By “identity” I mean something more than just who you are. I mean as well your “attributes”, or more broadly, all the facts about you (or a corporation, or a thing) that are true. Your identity, in this sense, includes your name, your sex, where you live, what your education is, your driver’s license number, your social security number, your purchases on Amazon.com, whether you’re a lawyer — and so on.

These attributes are known by others when they are communicated. In real space, some are communicated automatically: for most, sex, skin color, height, age range, and whether you have a good smile get transmitted automatically. Other attributes can’t be known unless they are revealed either by you, or by someone else: your GPA in high school, your favorite color, your social security number, your last purchase on Amazon, whether you’ve passed a bar exam.

Just because an attribute has been asserted, however, does not mean the attribute is believed. ( “You passed the bar?!”) Rather belief will often depend upon a process of “authentication.” In general, we “authenticate” when we want to become more confident about the truth about some asserted claim than appears on its face. “I’m married”, you say. “Show me the ring”, she says. The first statement is an assertion about an attribute you claim you have. The second is a demand for authentication. We could imagine (in a comedy at least) that demand continuing. “Oh come on, that’s not a wedding ring. Show me your marriage license.” At some point, the demands stop, either when enough confidence has been achieved, or when the inquiry has just become too weird.

Sometimes this process of authentication is relatively automatic. Some attributes, that is, are relatively self-authenticating: You say you’re a woman; I’m likely to believe it when I see you. You say you’re a native speaker; I’m likely to believe it once I speak with you. Of course, in both cases, I could be fooled. Thus, if my life depended upon it, I might take other steps to be absolutely confident of what otherwise appears plain. But for most purposes, with most familiar sorts of attributes, we learn how to evaluate without much more than our own individual judgment.

Some attributes, however, cannot be self-authenticating. You say you’re licensed to fly an airplane; I want to see the license. You say you’re a member of the California bar; I want to see your certificate. You say you’re qualified to perform open heart surgery on my father; I want to see things that make me confident that your claim is true. Once again, these authenticating “things” could be forged, and my confidence could be unjustified. But if I’m careful to match the process for authentication with the level of confidence that I need, I’m behaving quite rationally. And most of us can usually get by without a terribly complicated process of authentication.

One important tool sometimes used in this process of authentication is a credential. By “credential”, I mean a standardized device for authenticating (to some level of confidence) an assertion made. A driver’s license is a credential in this sense. Its purpose is to authenticate the status of a driver. We’re generally familiar with the form of such licenses; that gives us some confidence that we’ll be able to determine whether a particular license is valid. A passport is also a credential in this sense. Its purpose is to establish the citizenship of the person it identifies, and it identifies a person through relatively self-authenticating attributes. Once again, we are familiar with the form of this credential, and that gives us a relatively high level of confidence about the facts asserted in that passport.

Obviously, some credentials are better than others. Some are architected to give more confidence than others; some are more efficient at delivering their confidence than others. But we select among the credentials available depending upon the level of confidence that we need.

So take an obvious example to bring these points together: Imagine you’re a bank teller. Someone appears in front of you and declares that she is the owner of account # 654 –543231. She says she would like to withdraw all the money from that account.

In the sense I’ve described, this someone (call her Ms. X) has asserted a fact about her identity — that she is the owner of account # 654–543231. Your job now is to authenticate that assertion. So you pull up on your computer the records for the account, and you discover that there’s lots of money in it. Now your desire to be confident about the authentication you make is even stronger. You ask Ms. X her name; that name matches the name on the account. That gives you some confidence. You ask Ms. X for two forms of identification. Both match to Ms. X. Now you have even more confidence. You ask Ms. X to sign a withdrawal slip. The signatures seem to match; more confidence still. Finally, you note in the record that the account was established by your manager. You ask her whether she knows Ms. X. She confirms that she does, and that the “Ms. X” standing at the counter is indeed Ms. X. Now you’re sufficiently confident to turn over the money.

Notice that throughout this process, you’ve used technologies to help you authenticate the attribute asserted by Ms. X to be true. Your computer links a name to an account number. A driver’s license or passport ties a picture to a name. The computer keeps a copy of a signature. These are all technologies to increase confidence.

And notice too that we could imagine even better technologies to increase this confidence. Credit cards, for example, were developed at a time when merely possessing the credit card authenticated its use. That design creates the incentive to steal a credit card. ATM cards are different — in addition to possession, ATM cards require a password. That design reduces the value of stolen cards. But some write their passwords on their ATM cards, or keep them in their wallets with their ATMs. This means the risk from theft is not totally removed. But that risk could be further reduced by other technologies of authentication. For example, certain biometric technologies, such as thumbprint readers or eye scans, would increase the confidence that the holder of a card was an authorized user. (Though these technologies themselves can create their own risks: At a conference I heard a vendor describing a new technology for identifying someone based upon his handprint; a participant in the conference asked whether the hand had to be alive for the authentication to work. The vendor went very pale. After a moment, he replied, “I guess not.”)

We are constantly negotiating these processes of authentication in real life, and in this process, better technologies and better credentials enable more distant authentication. In a small town, in a quieter time, credentials were not necessary. You were known by your face, and your face carried with it a reference (held in the common knowledge of the community) about your character. But as life becomes more fluid, social institutions depend upon other technologies to build confidence around important identity assertions. Credentials thus become an unavoidable tool for securing such authentication.

If technologies of authentication can be better or worse, then, obviously, many have an interest in these technologies becoming better. We each would be better off if we could more easily and confidently authenticate certain facts about us. Commerce, too, would certainly be better off with better technologies of authentication. Poor technologies begat fraud; fraud is an unproductive cost for business. If better technology could eliminate that cost, then prices could be lower and profits possibly higher.

And finally, governments benefit from better technologies of authentication. If it is simple to authenticate your age, then rules that are triggered based upon age are more easily enforced (drinking ages, or limits on cigarettes). And if it is simple to authenticate who you are, then it will be easier for the government to trace who did what.

Fundamentally, the regulability of life in real-space depends upon certain architectures of authentication. The fact that witnesses can identify who committed a crime, either because they know the person or because of self-authenticating features such as “he was a white male, six feet tall”, enhances the ability of the state to regulate against that crime. If criminals were invisible or witnesses had no memory, crime would increase. The fact that fingerprints are hard to change and are now automatically traced to convicted felons increases the likelihood that felons will be caught again. Relying on a more changeable physical characteristic would reduce the ability of the police to track repeat offenders. The fact that cars have license plates and are registered by their owners increases the likelihood that a hit-and-run driver will be caught. Without licenses, and without systems registering owners, it would be extremely difficult to track car-related crime. In all these cases, and in many more, technologies of authentication of real-space life make regulating that life possible.

These three separate interests therefore point to a common interest. That’s not to say that every technology of authentication meets that common interest, nor is it to say that these interests will be enough to facilitate more efficient authentication. But it does mean that we can see which way these interests push. Better authentication can benefit everyone.

Identity and Authentication: Cyberspace

Identity and authentication in cyberspace and real space are in theory the same. In practice they are quite different. To see that difference, however, we need to see more about the technical detail of how the Net is built.

As I’ve already said, the Internet is built from a suite of protocols referred to collectively as “TCP/IP.” At its core, the TCP/IP suite includes protocols for exchanging packets of data between two machines “on” the Net.[2] Brutally simplified, the system takes a bunch of data (a file, for example), chops it up into packets, and slaps on the address to which the packet is to be sent and the address from which it is sent. The addresses are called Internet Protocol addresses, and they look like this: 128.34.35.204. Once properly addressed, the packets are then sent across the Internet to their intended destination. Machines along the way (“routers”) look at the address to which the packet is sent, and depending upon an (increasingly complicated) algorithm, the machines decide to which machine the packet should be sent next. A packet could make many “hops” between its start and its end. But as the network becomes faster and more robust, those many hops seem almost instantaneous.

In the terms I’ve described, there are many attributes that might be associated with any packet of data sent across the network. For example, the packet might come from an e-mail written by Al Gore. That means the e-mail is written by a former vice president of the United States, by a man knowledgeable about global warming, by a man over the age of 50, by a tall man, by an American citizen, by a former member of the United States Senate, and so on. Imagine also that the e-mail was written while Al Gore was in Germany, and that it is about negotiations for climate control. The identity of that packet of information might be said to include all these attributes.

But the e-mail itself authenticates none of these facts. The e-mail may say it’s from Al Gore, but the TCP/IP protocol alone gives us no way to be sure. It may have been written while Gore was in Germany, but he could have sent it through a server in Washington. And of course, while the system eventually will figure out that the packet is part of an e-mail, the information traveling across TCP/IP itself does not contain anything that would indicate what the content was. The protocol thus doesn’t authenticate who sent the packet, where they sent it from, and what the packet is. All it purports to assert is an IP address to which the packet is to be sent, and an IP address from which the packet comes. From the perspective of the network, this other information is unnecessary surplus. Like a daydreaming postal worker, the network simply moves the data and leaves its interpretation to the applications at either end.

This minimalism in the Internet’s design was not an accident. It reflects a decision about how best to design a network to perform a wide range over very different functions. Rather than build into this network a complex set of functionality thought to be needed by every single application, this network philosophy pushes complexity to the edge of the network — to the applications that run on the network, rather than the network’s core. The core is kept as simple as possible. Thus if authentication about who is using the network is necessary, that functionality should be performed by an application connected to the network, not by the network itself. Or if content needs to be encrypted, that functionality should be performed by an application connected to the network, not by the network itself.

This design principle was named by network architects Jerome Saltzer, David Clark, and David Reed as the end-to-end principle[3]. It has been a core principle of the Internet’s architecture, and, in my view, one of the most important reasons that the Internet produced the innovation and growth that it has enjoyed. But its consequences for purposes of identification and authentication make both extremely difficult with the basic protocols of the Internet alone. It is as if you were in a carnival funhouse with the lights dimmed to darkness and voices coming from around you, but from people you do not know and from places you cannot identify. The system knows that there are entities out there interacting with it, but it knows nothing about who those entities are. While in real space — and here is the important point — anonymity has to be created, in cyberspace anonymity is the given.

Identity and Authentication: Regulability

This difference in the architectures of real space and cyberspace makes a big difference in the regulability of behavior in each. The absence of relatively self-authenticating facts in cyberspace makes it extremely difficult to regulate behavior there. If we could all walk around as “The Invisible Man” in real space, the same would be true about real space as well. That we’re not capable of becoming invisible in real space (or at least not easily) is an important reason that regulation can work.

Thus, for example, if a state wants to control children’s access to “indecent” speech on the Internet, the original Internet architecture provides little help. The state can say to websites, “don’t let kids see porn.” But the website operators can’t know — from the data provided by the TCP/IP protocols at least — whether the entity accessing its web page is a kid or an adult. That’s different, again, from real space. If a kid walks into a porn shop wearing a mustache and stilts, his effort to conceal is likely to fail. The attribute “being a kid” is asserted in real space, even if efforts to conceal it are possible. But in cyberspace, there’s no need to conceal, because the facts you might want to conceal about your identity (i.e., that you’re a kid) are not asserted anyway.

All this is true, at least, under the basic Internet architecture. But as the last ten years have made clear, none of this is true by necessity. To the extent that the lack of efficient technologies for authenticating facts about individuals makes it harder to regulate behavior, there are architectures that could be layered onto the TCP/IP protocol to create efficient authentication. We’re far enough into the history of the Internet to see what these technologies could look like. We’re far enough into this history to see that the trend toward this authentication is unstoppable. The only question is whether we will build into this system of authentication the kinds of protections for privacy and autonomy that are needed.

Architectures of Identification

Most who use the Internet have no real sense about whether their behavior is monitored, or traceable. Instead, the experience of the Net suggests anonymity. Wikipedia doesn’t say “Welcome Back, Larry” when I surf to its site to look up an entry, and neither does Google. Most, I expect, take this lack of acknowledgement to mean that no one is noticing.

But appearances are quite deceiving. In fact, as the Internet has matured, the technologies for linking behavior with an identity have increased dramatically. You can still take steps to assure anonymity on the Net, and many depend upon that ability to do good (human rights workers in Burma) or evil (coordinating terrorist plots). But to achieve that anonymity takes effort. For most of us, our use of the Internet has been made at least traceable in ways most of us would never even consider possible.

Consider first the traceability resulting from the basic protocols of the Internet — TCP/IP. Whenever you make a request to view a page on the Web, the web server needs to know where to sent the packets of data that will appear as a web page in your browser. Your computer thus tells the web server where you are — in IP space at least — by revealing an IP address.

As I’ve already described, the IP address itself doesn’t reveal anything about who you are, or where in physical space you come from. But it does enable a certain kind of trace. If (1) you have gotten access to the web through an Internet Service Provider (ISP) that assigns you an IP address while you’re on the Internet and (2) that ISP keeps the logs of that assignment, then it’s perfectly possible to trace your surfing back to you.

How?

Well, imagine you’re angry at your boss. You think she’s a blowhard who is driving the company into bankruptcy. After months of frustration, you decide to go public. Not “public” as in a press conference, but public as in a posting to an online forum within which your company is being discussed.

You know you’d get in lots of trouble if your criticism were tied back to you. So you take steps to be “anonymous” on the forum. Maybe you create an account in the forum under a fictitious name, and that fictitious name makes you feel safe. Your boss may see the nasty post, but even if she succeeds in getting the forum host to reveal what you said when you signed up, all that stuff was bogus. Your secret, you believe, is safe.

Wrong. In addition to the identification that your username might, or might not, provide, if the forum is on the web, then it knows the IP address from which you made your post. With that IP address, and the time you made your post, using “a reverse DNS look-up[4]”, it is simple to identify the Internet Service Provider that gave you access to the Internet. And increasingly, it is relatively simple for the Internet Service Provider to check its records to reveal which account was using that IP address at that specified time. Thus, the ISP could (if required) say that it was your account that was using the IP address that posted the nasty message about your boss. Try as you will to deny it (“Hey, on the Internet, no one knows you’re a dog!”), I’d advise you to give up quickly. They’ve got you. You’ve been trapped by the Net. Dog or no, you’re definitely in the doghouse.

Now again, what made this tracing possible? No plan by the NSA. No strategy of Microsoft. Instead, what made this tracing possible was a by-product of the architecture of the Web and the architecture of ISPs charging access to the Web. The Web must know an IP address; ISPs require identification before they assign an IP address to a customer. So long as the log records of the ISP are kept, the transaction is traceable. Bottom line: If you want anonymity, use a pay phone!

This traceability in the Internet raised some important concerns at the beginning of 2006. Google announced it would fight a demand by the government to produce one million sample searches. (MSN and Yahoo! had both complied with the same request.) That request was made as part of an investigation the government was conducting to support its defense of a statute designed to block kids from porn. And though the request promised the data would be used for no other purpose, it raised deep concerns in the Internet community. Depending upon the data that Google kept, the request showed in principle that it was possible to trace legally troubling searches back to individual IP addresses (and to individuals with Google accounts). Thus, for example, if your Internet address at work is a fixed-IP address, then every search you’ve ever made from work is at least possibly kept by Google. Does that make you concerned? And assume for the moment you are not a terrorist: Would you still be concerned?

A link back to an IP address, however, only facilitates tracing, and again, even then not perfect traceability. ISPs don’t keep data for long (ordinarily); some don’t even keep assignment records at all. And if you’ve accessed the Internet at an Internet café, then there’s no reason to believe anything could be traced back to you. So still, the Internet provides at least some anonymity.

But IP tracing isn’t the only technology of identification that has been layered onto the Internet. A much more pervasive technology was developed early in the history of the Web to make the web more valuable to commerce and its customers. This is the technology referred to as “cookies.”

When the World Wide Web was first deployed, the protocol simply enabled people to view content that had been marked up in a special programming language. This language (HTML) made it easy to link to other pages, and it made it simple to apply basic formatting to the content (bold, or italics, for example).

But the one thing the protocol didn’t enable was a simple way for a website to know which machines had accessed it. The protocol was “state-less.” When a web server received a request to serve a web page, it didn’t know anything about the state of the requester before that request was made.[5]

From the perspective of privacy, this sounds like a great feature for the Web. Why should a website know anything about me if I go to that site to view certain content? You don’t have to be a criminal to appreciate the value in anonymous browsing. Imagine libraries kept records of every time you opened a book at the library, even for just a second.

Yet from the perspective of commerce, this “feature” of the original Web is plainly a bug, and not because commercial sites necessarily want to know everything there is to know about you. Instead, the problem is much more pragmatic. Say you go to Amazon.com and indicate you want to buy 20 copies of my latest book. (Try it. It’s fun.) Now your “shopping cart” has 20 copies of my book. You then click on the icon to check out, and you notice your shopping cart is empty. Why? Well because, as originally architected, the Web had no easy way to recognize that you were the same entity that just ordered 20 books. Or put differently, the web server would simply forget you. The Web as originally built had no way to remember you from one page to another. And thus, the Web as originally built would not be of much use to commerce.

But as I’ve said again and again, the way the Web was is not the way the Web had to be. And so those who were building the infrastructure of the Web quickly began to think through how the web could be “improved” to make it easy for commerce to happen. “Cookies” were the solution. In 1994, Netscape introduced a protocol to make it possible for a web server to deposit a small bit of data on your computer when you accessed that server. That small bit of data — the “cookie” — made it possible for the server to recognize you when you traveled to a different page. Of course, there are lots of other concerns about what that cookie might enable. We’ll get to those in the chapter about privacy. The point that’s important here, however, is not the dangers this technology creates. The point is the potential and how that potential was built. A small change in the protocol for client-server interaction now makes it possible for websites to monitor and track those who use the site.

This is a small step toward authenticated identity. It’s far from that, but it is a step toward it. Your computer isn’t you (yet). But cookies make it possible for the computer to authenticate that it is the same machine that was accessing a website a moment before. And it is upon this technology that the whole of web commerce initially was built. Servers could now “know” that this machine is the same machine that was here before. And from that knowledge, they could build a great deal of value.

Now again, strictly speaking, cookies are nothing more than a tracing technology. They make it simple to trace a machine across web pages. That tracing doesn’t necessarily reveal any information about the user. Just as we could follow a trail of cookie crumbs in real space to an empty room, a web server could follow a trail of “mouse droppings” from the first entry on the site until the user leaves. In both cases, nothing is necessarily revealed about the user.

But sometimes something important is revealed about the user by association with data stored elsewhere. For example, imagine you enter a site, and it asks you to reveal your name, your telephone number, and your e-mail address as a condition of entering a contest. You trust the website, and do that, and then you leave the website. The next day, you come back, and you browse through a number of pages on that website. In this interaction, of course, you’ve revealed nothing. But if a cookie was deposited on your machine through your browser (and you have not taken steps to remove it), then when you return to the site, the website again “knows” all these facts about you. The cookie traces your machine, and this trace links back to a place where you provided information the machine would not otherwise know.

The traceability of IP addresses and cookies is the default on the Internet now. Again, steps can be taken to avoid this traceability, but the vast majority of us don’t take them. Fortunately, for society and for most of us, what we do on the Net doesn’t really concern anyone. But if it did concern someone, it wouldn’t be hard to track us down. We are a people who leave our “mouse droppings” everywhere.

This default traceability, however, is not enough for some. They require something more. That was Harvard’s view, as I noted in the previous chapter. That is also the view of just about all private networks today. A variety of technologies have developed that enable stronger authentication by those who use the Net. I will describe two of these technologies in this section. But it is the second of these two that will, in my view, prove to be the most important.

The first of these technologies is the Single Sign-on (SSO) technology. This technology allows someone to “sign-on” to a network once, and then get access to a wide range of resources on that network without needing to authenticate again. Think of it as a badge you wear at your place of work. Depending upon what the badge says ( “visitor” or “researcher”) you get different access to different parts of the building. And like a badge at a place of work, you get the credential by giving up other data. You give the receptionist an ID; he gives you a badge; you wear that badge wherever you go while at the business.

The most commonly deployed SSO is a system called Kerberos. But there are many different SSOs out there — Microsoft’s Passport system is an example — and there is a strong push to build federated SSOs for linking many different sites on the Internet. Thus, for example, in a federated system, I might authenticate myself to my university, but then I could move across any domain within the federation without authenticating again. The big advantage in this architecture is that I can authenticate to the institution I trust without spreading lots of data about myself to institutions I don’t trust.

SSOs have been very important in building identity into the Internet. But a second technology, I believe, will become the most important tool for identification in the next ten years. This is because this alternative respects important architectural features of the Internet, and because the demand for better technologies of identification will continue to be strong. Forget the hassle of typing your name and address at every site you want to buy something from. You only need to think about the extraordinary growth in identity theft to recognize there are many who would be eager to see something better come along.

To understand this second system, think first about how credentials work in real space[6]. You’ve got a wallet. In it is likely to be a driver’s license, some credit cards, a health insurance card, an ID for where you work, and, if you’re lucky, some money. Each of these cards can be used to authenticate some fact about you — again, with very different levels of confidence. The driver’s license has a picture and a list of physical characteristics. That’s enough for a wine store, but not enough for the NSA. The credit card has your signature. Vendors are supposed to use that data to authenticate that the person who signs the bill is the owner of the card. If the vendor becomes suspicious, she might demand that you show an ID as well.

Notice the critical features of this “wallet” architecture. First, these credentials are issued by different entities. Second, depending upon their technology, they offer different levels of confidence. Third, I’m free to use these credentials in ways never originally planned or intended by the issuer of the credential. The Department of Motor Vehicles never coordinated with Visa to enable driver’s licenses to be used to authenticate the holder of a credit card. But once the one was prevalent, the other could use it. And fourth, nothing requires that I show all my cards when I can use just one. That is, to show my driver’s license, I don’t also reveal my health insurance card. Or to use my Visa, I don’t also have to reveal my American Express card.

These same features are at the core of what may prove to be the most important addition to the effective architecture of the Internet since its birth. This is a project being led by Microsoft to essentially develop an Identity Metasystem — a new layer of the Internet, an Identity Layer, that would complement the existing network layers to add a new kind of functionality. This Identity Layer is not Microsoft Passport, or some other Single Sign-On technology. Instead it is a protocol to enable a kind of virtual wallet of credentials, with all the same attributes of the credentials in your wallet — except better. This virtual wallet will not only be more reliable than the wallet in your pocket, it will also give you the ability to control more precisely what data about you is revealed to those who demand data about you.

For example, in real space, your wallet can easily be stolen. If it’s stolen, then there’s a period of time when it’s relatively easy for the thief to use the cards to buy stuff. In cyberspace, these wallets are not easily stolen. Indeed, if they’re architected well, it would be practically impossible to “steal” them. Remove the cards from their holder, and they become useless digital objects.

Or again, in real space, if you want to authenticate that you’re over 21 and therefore can buy a six-pack of beer, you show the clerk your driver’s license. With that, he authenticates your age. But with that bit of data, he also gets access to your name, your address, and in some states, your social security number. Those other bits of data are not necessary for him to know. In some contexts, depending on how creepy he is, these data are exactly the sort you don’t want him to know. But the inefficiencies of real-space technologies reveal these data. This loss of privacy is a cost of doing business.

The virtual wallet would be different. If you need to authenticate your age, the technology could authenticate that fact alone — indeed, it could authenticate simply that you’re over 21, or over 65, or under 18, without revealing anything more. Or if you need to authenticate your citizenship, that fact can be certified without revealing your name, or where you live, or your passport number. The technology is crafted to reveal just what you want it to reveal, without also revealing other stuff. (As one of the key architects for this metasystem, Kim Cameron, described it: “To me, that’s the center of the system.[7]”) And, most importantly, using the power of cryptography, the protocol makes it possible for the other side to be confident about the fact you reveal without requiring any more data.

The brilliance in this solution to the problems of identification is first that it mirrors the basic architecture of the Internet. There’s no central repository for data; there’s no network technology that everyone must adopt. There is instead a platform for building identity technologies that encourages competition among different privacy and security providers — TCP/IP for identity. Microsoft may be leading the project, but anyone can build for this protocol. Nothing ties the protocol to the Windows operating system. Or to any other specific vendor. As Cameron wisely puts it, “it can’t be owned by any one company or any one country . . . or just have the technology stamp of any one engineer.[8]”

The Identity Layer is infrastructure for the Internet. It gives value (and raises concerns) to many beyond Microsoft. But though Microsoft’s work is an important gift to the Internet, the Identity Layer is not altruism. “Microsoft’s strategy is based on web services”, Cameron described to me. “Web services are impossible without identity.[9]” There is important public value here, but private interest is driving the deployment of this public value.

The Identity Layer would benefit individuals, businesses, and the government, but each differently. Individuals could more easily protect themselves from identity theft[10]; if you get an e-mail from PayPal demanding you update your account, you’ll know whether the website is actually PayPal. Or if you want to protect yourself against spam, you could block all e-mail that doesn’t come from an authenticated server. In either case, the technology is increasing confidence about the Internet. And the harms that come from a lack of confidence — mainly fraud — would therefore be reduced.

Commerce too would benefit from this form of technology. It too benefits from the reduction of fraud. And it too would benefit from a more secure infrastructure for conducting online transactions.

And finally, the government would benefit from this infrastructure of trust. If there were a simple way to demand that people authenticate facts about themselves, it would be easier for the government to insist that they do so. If it were easier to have high confidence that the person on the website was who he said he was, then it would be cheaper to deliver certain information across the web.

But while individuals, commerce, and government would all benefit from this sort of technology, there is also something that each could lose.

Individuals right now can be effectively anonymous on the Net. A platform for authenticated identity would make anonymity much harder. We might imagine, for example, a norm developing to block access to a website by anyone not carrying a token that at least made it possible to trace back to the user — a kind of driver’s license for the Internet. That norm, plus this technology, would make anonymous speech extremely difficult.

Commerce could also lose something from this design. To the extent that there are simple ways to authenticate that I am the authorized user of this credit card, for example, it’s less necessary for websites to demand all sorts of data about me — my address, my telephone numbers, and in one case I recently encountered, my birthday. That fact could build a norm against revealing extraneous data. But that data may be valuable to business beyond simply confirming a charge.

And governments, too, may lose something from this architecture of identification. Just as commerce may lose the extra data that individuals need to reveal to authenticate themselves, so too will the government lose that. It may feel that such data is necessary for some other purpose, but gathering it would become more difficult.

Each of these benefits and costs can be adjusted, depending upon how the technology is implemented. And as the resulting mix of privacy and security is the product of competition and an equilibrium between individuals and businesses, there’s no way up front to predict what it will be.

But for our purposes, the only important fact to notice is that this infrastructure could effectively answer the first question that regulability requires answering: Who did what where? With an infrastructure enabling cheap identification wherever you are, the frequency of unidentified activity falls dramatically.

This final example of an identification technology throws into relief an important fact about encryption technology. The Identity Layer depends upon cryptography. It thus demonstrates the sense in which cryptography is Janus-faced. As Stewart Baker and Paul Hurst put it, cryptography “surely is the best of technologies and the worst of technologies. It will stop crimes and it will create new crimes. It will undermine dictatorships, and it will drive them to new excesses. It will make us all anonymous, and it will track our every transaction.[11]”

Cryptography can be all these things, both good and bad, because encryption can serve two fundamentally different ends. In its “confidentiality” function it can be “used to keep communications secret.” In its “identification” function it can be “used to provide forgery-proof digital identities.[12]” It enables freedom from regulation (as it enhances confidentiality), but it can also enable more efficient regulation (as it enhances identification).[13]

Its traditional use is secrets. Encrypt a message, and only those with the proper key can open and read it. This type of encryption has been around as long as language itself. But until the mid-1970s it suffered from an important weakness: the same key that was used to encrypt a message was also used to decrypt it. So if you lost that key, all the messages hidden with that key were also rendered vulnerable. If a large number of messages were encrypted with the same key, losing the key compromised the whole archive of secrets protected by the key. This risk was significant. You always had to “transport” the key needed to unlock the message, and inherent in that transport was the risk that the key would be lost.

In the mid-1970s, however, a breakthrough in encryption technique was announced by two computer scientists, Whitfield Diffie and Martin Hellman[14]. Rather than relying on a single key, the Diffie-Hellman system used two keys — one public, the other private. What is encrypted with one can be decrypted only with the other. Even with one key there is no way to infer the other.

This discovery was the clue to an architecture that could build an extraordinary range of confidence into any network, whether or not the physical network itself was secure[15]. And again, that confidence could both make me confident that my secrets won’t be revealed and make me confident that the person using my site just now is you. The technology therefore works to keep secrets, but it also makes it harder to keep secrets. It works to make stuff less regulable, and more regulable.

In the Internet’s first life, encryption technology was on the side of privacy. Its most common use was to keep information secret. But in the Internet’s next life, encryption technology’s most important role will be in making the Net more regulable. As an Identity Layer gets built into the Net, the easy ability to demand some form of identity as a condition to accessing the resources of the Net increases. As that ability increases, its prevalence will increase as well. Indeed, as Shawn Helms describes, the next generation of the Internet Protocol — IPv6 — “marks each packet with an encryption ‘key’ that cannot be altered or forged, thus securely identifying the packet’s origin. This authentication function can identify every sender and receiver of information over the Internet, thus making it nearly impossible for people to remain anonymous on the Internet.[16]”

And even if not impossible, sufficiently difficult for the vast majority of us. Our packets will be marked. We — or something about us — will be known.

Who Did What, Where?

Regulability also depends upon knowing the “what” in “who did what, where?” But again, the Internet as originally designed didn’t help the regulator here either. If the Internet protocol simply cuts up data into packets and stamps an address on them, then nothing in the basic protocol would tell anyone looking at the packet what the packet was for.

For example, imagine you’re a telephone company providing broadband Internet access (DSL) across your telephone lines. Some smart innovator develops Voice-over-IP (VOIP) — an application that makes it possible to use the Internet to make telephone calls. You, the phone company, aren’t happy about that, because now people using your DSL service can make unmetered telephone calls. That freedom cuts into your profit.

Is there anything you can do about this? Relying upon just the Internet protocols, the answer is no. The “packets” of data that contain the simulated-telephone calls look just like any packet of data. They don’t come labeled with VOIP or any other consistent moniker. Instead, packets are simply marked with addresses. They are not marked with explanations of what is going on with each.

But as my example is meant to suggest, we can easily understand why some would be very keen to understand what packets are flowing across their network, and not just for anti-competitive purposes. Network administrators trying to decide whether to add new capacity need to know what the existing capacity is being used for. Businesses keen to avoid their employees wasting time with sports or porn have a strong interest in knowing just what their employees are doing. Universities trying to avoid viruses or malware being installed on network computers need to know what kind of packets are flowing onto their network. In all these cases, there’s an obvious and valid will to identify what packets are flowing on the network. And as they say, where there’s a will, there’s a way.

The way follows the same technique described in the section above. Again, the TCP/IP protocol doesn’t include technology for identifying the content carried in TCP/IP packets. But it also doesn’t interfere with applications that might examine TCP/IP packets and report what those packets are about.

So, for example, consider a package produced by Ipanema Technologies. This technology enables a network owner to inspect the packets traveling on its network. As its webpage promises,

The Ipanema Systems “deep” layer 7 packet inspection automatically recognizes all critical business and recreational application flows running over the network. Real-time graphical interfaces as well as minute-by-minute reports are available to rapidly discover newly deployed applications.[17]

Using the data gathered by this technology, the system generates reports about the applications being used in the network, and who’s using them. These technologies make it possible to control network use, either to economize on bandwidth costs, or to block uses that the network owner doesn’t permit.

Another example of this kind of content control is a product called “iProtectYou.[18]”This product also scans packets on a network, but this control is implemented at the level of a particular machine. Parents load this software on a computer; the software then monitors all network traffic with that computer. As the company describes, the program can then “filter harmful websites and newsgroups; restrict Internet time to a predetermined schedule; decide which programs can have Internet access; limit the amount of data that can be sent or received to/from your computer; block e-mails, online chats, instant messages and P2P connections containing inappropriate words; and produce detailed Internet activity logs.” Once again, this is an application that sits on top of the network and watches. It intervenes in network activity when it identifies the activity as the kind the administrator wants to control.

In addition to these technologies of control, programmers have developed a wide range of programs to monitor networks. Perhaps the dominant application in this context is called “nmap” — a program

for network exploration or security auditing . . . designed to rapidly scan large networks. . . . Nmap uses raw IP packets in novel ways to determine what hosts are available on the network, what services (application name and version) those hosts are offering, what operating systems (and OS versions) they are running, what type of packet filters/firewalls are in use, and dozens of other characteristics.[19]

This software is “free software”, meaning the source code is available, and any modifications of the source code must be made available as well. These conditions essentially guarantee that the code necessary to engage in this monitoring will always be available.

Finally, coders have developed “packet filtering” technology, which, as one popular example describes, “is the selective passing or blocking of data packets as they pass through a network interface. . . . The most often used criteria are source and destination address, source and destination port, and protocol. ” This again is a technology that’s monitoring “what” is carried within packets, and decides what’s allowed based upon what it finds.

In each of these cases, a layer of code complements the TCP/IP protocol, to give network administrators something TCP/IP alone would not — namely, knowledge about “what” is carried in the network packets. That knowledge increases the “regulability” of network use. If a company doesn’t want its employees using IM chat, then these technologies will enforce that rule — by blocking the packets containing IM chat. Or if a company wants to know which employees use sexually explicit speech in Internet communication, these technologies will reveal that as well. Again, there are plenty of perfectly respectable reasons why network administrators might want to exercise this regulatory authority — even if there are plenty of cases where such power would be an abuse. Because of this legitimate demand, software products like this are developed.

Now, of course, there are countermeasures that users can adopt to avoid just this sort of monitoring. A user who encrypts the data he sends across the network will avoid any filtering on the basis of key words. And there are plenty of technologies designed to “anonymize” behavior on the Net, so administrators can’t easily know what an individual is doing on a network. But these countermeasures require a significant investment for a particular user to deploy — whether of time or money. The vast majority won’t bother, and the ability of network administrators to monitor content and use of the network will be preserved.

Thus, as with changes that increased the ability to identify “who” someone is who is using a network, here too, private interests provide a sufficient incentive to develop technologies that make it increasingly easy to say “what” someone is doing who is using a network. A gap in the knowledge provided by the plain vanilla Internet is thus plugged by these privately developed technologies.

Who Did What, Where?

Finally, as long as different jurisdictions impose different requirements, the third bit of data necessary to regulate efficiently is knowing where the target of regulation is. If France forbids the selling of Nazi paraphernalia, but the United States does not, then a website wanting to respect the laws of France must know something about where the person accessing the Internet is coming from.

But once again, the Internet protocols didn’t provide that data. And thus, it would be extremely difficult to regulate or zone access to content on the basis of geography.

The original Internet made such regulation extremely difficult. As originally deployed, as one court put it:

The Internet is wholly insensitive to geographic distinctions. In almost every case, users of the Internet neither know nor care about the physical location of the Internet resources they access. Internet protocols were designed to ignore rather than document geographic location; while computers on the network do have “addresses”, they are logical addresses on the network rather than geographic addresses in real space. The majority of Internet addresses contain no geographic clues and, even where an Internet address provides such a clue, it may be misleading.[20]

But once again, commerce has come to the rescue of regulability. There are obvious reasons why it would useful to be able to identify where someone is when they access some website. Some of those reasons have to do with regulation — again, blocking Nazi material from the French, or porn from kids in Kansas. We’ll consider these reasons more extensively later in this book. For now, however, the most interesting reasons are those tied purely to commerce. And, again, these commercial reasons are sufficient to induce the development of this technology.

Once again, the gap in the data necessary to identify someone’s location is the product of the way IP addresses are assigned. IP addresses are virtual addresses; they don’t refer to a particular geographic place. They refer to a logical place on the network. Thus, two IP addresses in principle could be very close to each other in number, but very far from each other in geography. That’s not the way, for example, zip codes work. If your zip code is one digit from mine (e.g., 94115 vs. 94116), we’re practically neighbors.

But this gap in data is simply the gap in data about where someone is deducible from his IP address. That means, while there’s no simple way to deduce from 23.214.23.15 that someone is in California, it is certainly possible to gather the data necessary to map where someone is, given the IP address. To do this, one needs to construct a table of IP addresses and geographic locations, and then track both the ultimate IP address and the path along which a packet has traveled to where you are from where it was sent. Thus while the TCP/IP protocol can’t reveal where someone is directly, it can be used indirectly to reveal at least the origin or destination of an IP packet.

The commercial motivations for this knowledge are obvious. Jack Goldsmith and Tim Wu tell the story of a particularly famous entrepreneur, Cyril Houri, who was inspired to develop IP mapping technology. Sitting in his hotel in Paris one night, he accessed his e-mail account in the United States. His e-mail was hosted on a web server, but he noticed that the banner ads at the top of the website were advertising an American flower company. That gave him a (now obvious) idea: Why not build a tool to make it easy for a website to know from where it is being accessed, so it can serve relevant ads to those users?[21]

Houri’s idea has been copied by many. Geoselect, for example, is a company that provides IP mapping services. Just browse to their webpage, and they’re 99 percent likely to be able to tell you automatically where you are browsing from. Using their services, you can get a geographical report listing the location of the people who visit your site, and you can use their products to automatically update log files on your web server with geographic data. You can automatically change the greeting on your website depending upon where the user comes from, and you can automatically redirect a user based upon her location. All of this functionality is invisible to the user. All he sees is a web page constructed by tools that know something that the TCP/IP alone doesn’t reveal — where someone is from.

So what commercial reasons do websites have for using such software? One company, MaxMind[22], lists the major reason as credit card fraud: If your customer comes from a “high risk IP address” — meaning a location where it’s likely the person is engaged in credit card fraud — then MaxMind’s service will flag the transaction and direct that it have greater security verification. MaxMind also promises the service will be valuable for “targeted advertising.” Using its product, a client can target a message based upon country, state, or city, as well as a “metropolitan code”, an area code, and connection speed of the user (no need to advertise DVD downloads to a person on a dial-up connection).

Here too there is an important and powerful open source application that provides the same IP mapping functions. Hostip.info gives website operators — for free — the ability to “geolocate” the users of their site[23]. This again means that the core functionality of IP mapping is not held exclusively by corporations or a few individuals. Any application developer — including a government — could incorporate the function into its applications. The knowledge and functionality is free.

Thus, again, one of the original gaps in the data necessary to make behavior regulable on the Internet — geographic identity — has been filled. But it has not been filled by government mandate or secret NSA operations (or so I hope). Instead, the gap has been filled by a commercial interest in providing the data the network itself didn’t. Technology now layers onto the Internet to produce the data the network needs.

But it is still possible to evade identification. Civil liberty activist Seth Finkelstein has testified to the relative ease with which one can evade this tracking.[24] Yet as I will describe more below, even easily evaded tracking can be effective tracking. And when tied to the architectures for identity described above, this sort will become quite effective.

Results

In the last chapter, we saw that the unregulability of the Internet was a product of design: that the failure of that network to identify who someone is, what they’re doing, and where they’re from meant that it would be particularly difficult to enforce rules upon individuals using the network. Not impossible, but difficult. Not for all people, but for enough to matter. The Internet as it originally was gave everyone a “Ring of Gyges”, the ring which, as Plato reports in The Republic, made Gyges the shepherd invisible. The dilemma for regulation in such a world is precisely the fear Plato had about this ring: With such a ring, “no man can be imagined to be of such an iron nature that he would stand fast in justice.[25]”

And if such a man did choose justice, even with the power of the ring, then “he would be thought by the lookers-on to be a most wretched idiot, although they would praise him to one another’s faces, and keep up appearances with one another from a fear that they too might suffer injustice. ”

But these gaps in the Internet’s original design are not necessary. We can imagine networks that interact seamlessly with the Internet but which don’t have these “imperfections.” And, more importantly, we can see why there would be an important commercial interest in eliminating these gaps.

Yet you may still be skeptical. Even if most Internet activity is traceable using the technologies that I’ve described, you may still believe there are significant gaps. Indeed, the explosion of spam, viruses, ID theft, and the like are strong testimony to the fact that there’s still a lot of unregulable behavior. Commerce acting alone has not yet eliminated these threats, to both commerce and civil life. For reasons I explore later in this book, it’s not even clear commerce could.

But commerce is not the only actor here. Government is also an important ally, and the framework of regulability that commerce has built could be built on again by government.

Government can, in other words, help commerce and help itself. How it does so is the subject of the chapter that follows.

Chapter 5. Regulating Code

Commerce has done its part — for commerce, and indirectly, for governments. Technologies that make commerce more efficient are also technologies that make regulation simpler. The one supports the other. There are a host of technologies now that make it easier to know who someone is on the Net, what they’re doing, and where they’re doing it. These technologies were built to make business work better. They make life on the Internet safer. But the by-product of these technologies is to make the Net more regulable.

More regulable. Not perfectly regulable. These tools alone do a great deal. As Joel Reidenberg notes, they are already leading courts to recognize how behavior on the Net can be reached — and regulated.[1] But they don’t yet create the incentives to build regulability into the heart of the Net. That final step will require action by the government.[2]

When I wrote the first version of this book, I certainly expected that the government would eventually take these steps. Events since 1999 — including the birth of Z-theory described below — have only increased my confidence. In the United States, the identification of “an enemy” — terrorism — has weakened the resolve to resist government action to make government more powerful and regulation more effective. There’s a limit, or at least I hope there is, but there is also no doubt that the line has been moved. And in any case, there is not much more that the government would need to do in order to radically increase the regulability of the net. These steps would not themselves excite any significant resistance. The government has the means, and the motive. This chapter maps the opportunity.

The trick is obvious once it is seen. It may well be difficult for the government to regulate behavior directly, given the architecture of the Internet as it is. But that doesn’t mean it is difficult for the government to regulate the architecture of the Internet as it is. The trick, then, is for the government to take steps that induce the development of an architecture that makes behavior more regulable.

In this context, I don’t mean by “architecture” the regulation of TCP/IP itself. Instead, I simply mean regulation that changes the effective constraints of the architecture of the Internet, by altering the code at any layer within that space. If technologies of identification are lacking, then regulating the architecture in this sense means steps the government can take to induce the deployment of technologies of identification.

If the government takes these steps, it will increase the regulability of behavior on the Internet. And depending upon the substance of these steps taken, it could render the Internet the most perfectly regulable space we’ve known. As Michael Geist describes it, “governments may have been willing to step aside during the commercial Internet’s nascent years, but no longer.”[3]

Regulating Architecture: The regulatory two-step

We can call this the “regulatory two-step”: In a context in which behavior is relatively unregulable, the government takes steps to increase regulability. And once framed, there are any number of examples that set the pattern for the two-step in cyberspace.

Car Congestion

London had a problem with traffic. There were too many cars in the central district, and there was no simple way to keep “unnecessary” cars out.

So London did three things. It first mandated a license plate that a video camera could read, and then it installed video cameras on as many public fixtures as it would take to monitor — perpetually — what cars were where.

Then, beginning in February 2003, the city imposed a congestion tax: Initially £5 per day (between 7 a.m. and 6:30 p.m.) for any car (save taxis and residents paying a special fee), raised to £8 in July 2005. After 18 months in operation, the system was working “better than expected.” Traffic delays were down 32 percent, traffic within the city was down 15 percent, and delays on main routes into the zones were down 20 percent. London is now exploring new technologies to make it even easier to charge for access more accurately. These include new tagging technologies, as well as GPS and GSM technologies that would monitor the car while within London.[4]

Telephones

The architecture of telephone networks has undergone a radical shift in the past decade. After resisting the design of the Internet for many years[5], telephone networks are now shifting from circuit-switched to packet-switched networks. As with the Internet, packets of information are spewed across the system, and nothing ensures that they will travel in the same way, or along the same path. Packets take the most efficient path, which depends on the demand at any one time.

This design, however, creates problems for law enforcement — in particular, that part of law enforcement that depends upon wiretaps to do their job. In the circuit-switched network, it was relatively simple to identify which wires to tap. In the packet-switched network, where there are no predictable paths for packets of data to travel, wiretapping becomes much more difficult.

At least it is difficult under one design of a packet-switched network. Different designs will be differently difficult. And that potential led Congress in 1994 to enact the Communications Assistance for Law Enforcement Act (CALEA). CALEA requires that networks be designed to preserve the ability of law enforcement to conduct electronic surveillance. This requirement has been negotiated in a series of “safe harbor” agreements that specify the standards networks must meet to satisfy the requirements of the law.

CALEA is a classic example of the kind of regulation that I mean this chapter to flag. The industry created one network architecture. That architecture didn’t adequately serve the interests of government. The response of the government was to regulate the design of the network so it better served the government’s ends. (Luckily for the networks, the government, at least initially, agreed to pick up part of the cost.[6]) As Susan Crawford writes,

Most critically for the future of the Internet, law enforcement . . . has made clear that it wants to ensure that it reviews all possibly relevant new services for compliance with unstated information-gathering and information-forwarding requirements before these services are launched. All prudent businesses will want to run their services by law enforcement, suggests the DOJ: “Service providers would be well advised to seek guidance early, preferably well before deployment of a service, if they believe that their service is not covered by CALEA. . . . DOJ would certainly consider a service provider’s failure to request such guidance in any enforcement action.”[7]

CALEA is a “signal”, Crawford describes, that the “FCC may take the view that permission will be needed from government authorities when designing a wide variety of services, computers, and web sites that use the Internet protocol. . . . I nformation flow membranes will be governmentally mandated as part of the design process for online products and services.[8]” That hint has continued: In August 2005, the Federal Communications Commission (FCC) ruled that Voice-over-IP services “must be designed so as to make government wiretapping easier.”[9]

Of course, regulating the architecture of the network was not the only means that Congress had. Congress could have compensated for any loss in crime prevention that resulted from the decreased ability to wiretap by increasing criminal punishments.[10] Or Congress could have increased the resources devoted to criminal investigation. Both of these changes would have altered the incentives that criminals face without using the network’s potential to help track and convict criminals. But instead, Congress acted to change the architecture of the telephone networks, thus using the networks directly to change the incentives of criminals indirectly.

This is law regulating code. Its indirect effect is to improve law enforcement, and it does so by modifying code-based constraints on law enforcement.

Regulation like this works well with telephone companies. There are few companies, and the regulation is relatively easy to verify. Telephone companies are thus regulable intermediaries: Rules directed against them are likely to be enforced.

But what about when telephone service (or rather “telephone service”) begins to be carried across the Internet? Vonage, or Skype, rather than Bell South? Are these entities similarly regulable?[11]

The answer is that they are, though for different reasons. Skype and Vonage, as well as many other VOIP providers, seek to maximize their value as corporations. That value comes in part from demonstrating reliably regulable behavior. Failing to comply with the rules of the United States government is not a foundation upon which to build a healthy, profitable company. That’s as true for General Motors as it is for eBay.

Telephones: Part 2

Four years after Congress enacted CALEA, the FBI petitioned the Federal Communications Commission to enhance even further government’s power to regulate. Among the amendments the FBI proposed was a regulation designed to require disclosure of the locations of individuals using cellular phones by requiring the phone companies to report the cell tower from which the call was served.[12] Cellular phone systems need this data to ensure seamless switching between transmitters. But beyond this and billing, the phone companies have no further need for this information.

The FBI, however, has interests beyond those of the companies. It would like that data made available whenever it has a “legitimate law enforcement reason” for requesting it. The proposed amendment to CALEA would require the cellular company to provide this information, which is a way of indirectly requiring that it write its code to make the information retrievable.[13]

The original motivation for this requirement was reasonable enough: Emergency service providers needed a simple way to determine where an emergency cellular phone call was coming from. Thus, revealing location data was necessary, at least in those cases. But the FBI was keen to extend the reach of location data beyond cases where someone was calling 911, so they pushed to require the collection of this information whenever a call is made.

So far, the FBI has been successful in its requests with the regulators but less so with courts. But the limits the courts have imposed simply require the FBI to meet a high burden of proof to get access to the data. Whatever the standard, the effect of the regulation has been to force cell phone companies to build their systems to collect and preserve a kind of data that only aids the government.

Data Retention

Computers gather data about how they’re used. These data are collected in logs. The logs can be verbose or not — meaning they might gather lots of data, or little. And the more they gather, the easier it will be to trace who did what.

Governments are beginning to recognize this. And some are making sure they can take advantage of it. The United States is beginning to “mull”[14], and the European Union has adopted, legislation to regulate “data generated or processed in connection with the provision of publicly available electronic communications, ” by requiring that providers retain specified data to better enable law enforcement. This includes data to determine the source, destination, time, duration, type, and equipment used in a given communication.[15] Rules such as this will build a layer of traceability into the platform of electronic communication, making it easier for governments to track individual behavior. (By contrast, in 2006, Congressman Ed Markey of Massachusetts proposed legislation to forbid certain Internet companies, primarily search engines, from keeping logs that make Internet behavior traceable.[16] We’ll see how far that proposed rule gets.)

Encryption

The examples so far have involved regulations directed to code writers as a way indirectly to change behavior. But sometimes, the government is doubly indirect: Sometimes it creates market incentives as a way to change code writing, so that the code writing will indirectly change behavior. An example is the U.S. government’s failed attempt to secure Clipper as the standard for encryption technology.[17]

I have already sketched the Janus-faced nature of encryption: The same technology enables both confidentiality and identification. The government is concerned with the confidentiality part. Encryption allows individuals to make their conversations or data exchanges untranslatable except by someone with a key. How untranslatable is a matter of debate,[18] but we can put that debate aside for the moment, because, regardless, it is too untranslatable for the government’s liking. So the government sought to control the use of encryption technology by getting the Clipper chip accepted as a standard for encryption.

The mechanics of the Clipper chip are not easily summarized, but its aim was to encourage encryption technologies that left a back door open for the government.[19] A conversation could be encrypted so that others could not understand it, but the government would have the ability (in most cases with a court order) to decrypt the conversation using a special key.

The question for the government then was how it could spread the Clipper chip technology. At first, the Clinton administration thought that the best way was simply to ban all other encryption technology. This strategy proved very controversial, so the government then fixed on a different technique: It subsidized the development and deployment of the Clipper chip.[20]

The thinking was obvious: If the government could get industry to use Clipper by making Clipper the cheapest technology, then it could indirectly regulate the use of encryption. The market would do the regulation for the government.[21]

The subsidy plan failed. Skepticism about the quality of the code itself, and about the secrecy with which it had been developed, as well as strong opposition to any governmentally directed encryption regime (especially a U.S.-sponsored regime), led most to reject the technology. This forced the government to take another path.

That alternative is for our purposes the most interesting. For a time, some were pushing for authority to regulate authors of encryption code directly — with a requirement that they build into their code a back door through which the government could gain access.[22] While the proposals have been various, they all aim at ensuring that the government has a way to crack whatever encryption code a user selects.

Compared with other strategies — banning the use of encryption or flooding the market with an alternative encryption standard — this mode presents a number of advantages.

First, unlike banning the use of encryption, this mode of regulation does not directly interfere with the rights of use by individuals. It therefore is not vulnerable to a strong, if yet unproven constitutional claim that an individual has a right “to speak through encryption.” It aims only to change the mix of encryption technologies available, not to control directly any particular use by an individual. State regulation of the writing of encryption code is just like state regulation of the design of automobiles: Individual use is not regulated. Second, unlike the technique of subsidizing one market solution, this solution allows the market to compete to provide the best encryption system, given this regulatory constraint. Finally, unlike both other solutions, this one involves the regulation of only a relatively small number of actors, since manufacturers of encryption technology are far fewer in number than users or buyers of encryption systems.

Like the other examples in this section, then, this solution is an example of the government regulating code directly so as to better regulate behavior indirectly; the government uses the architecture of the code to reach a particular substantive end. Here the end, as with digital telephony, is to ensure that the government’s ability to search certain conversations is not blocked by emerging technology. And again, the government pursues that end not by regulating primary behavior but by regulating the conditions under which primary behavior happens.

Regulating Code to Increase Regulability

All five of these examples address a behavior that the government wants to regulate, but which it cannot (easily) regulate directly. In all five, the government thus regulates that behavior indirectly by directly regulating technologies that affect that behavior. Those regulated technologies in turn influence or constrain the targeted behavior differently. They “influence the development of code.”[23] They are regulations of code that in turn make behavior more regulable.

The question that began this chapter was whether there were similar ways that the government might regulate code on the Internet to make behavior on the Net more regulable. The answer is obviously yes. There are many steps the government might take to make behavior on the network more regulable, and there are obvious reasons for taking those steps.

If done properly, these steps would reduce and isolate untraceable Internet behavior. That in turn would increase the probability that bad behavior would be detected. Increased detection would significantly reduce the expected return from maliciousness. For some significant range of malevolent actors, that shift would drive their bad behavior elsewhere.

This would not work perfectly, of course. No effort of control could ever be perfect in either assuring traceability or tracking misbehavior. But perfection is not the standard. The question is whether the government could put enough incentives into the mix of the network to induce a shift towards traceability as a default. For obvious reasons, again, the answer is yes.

The General Form

If the government’s aim is to facilitate traceability, that can be achieved by attaching an identity to actors on the network. One conceivable way to do that would be to require network providers to block actions by individuals not displaying a government-issued ID. That strategy, however, is unlikely, as it is politically impossible. Americans are antsy enough about a national identity card;[24] they are not likely to be interested in an Internet identity card.

But even if the government can’t force cyber citizens to carry IDs, it is not difficult to create strong incentives for individuals to carry IDs. There is no requirement that all citizens have a driver’s license, but you would find it very hard to get around without one, even if you do not drive. The government does not require that you keep state-issued identification on your person, but if you want to fly to another city, you must show at least one form of it. The point is obvious: Make the incentive to carry ID so strong that it tips the normal requirements of interacting on the Net.

In the same way, the government could create incentives to enable digital IDs, not by regulating individuals directly but by regulating intermediaries. Intermediaries are fewer, their interests are usually commercial, and they are ordinarily pliant targets of regulation. ISPs will be the “most important and obvious” targets — “focal points of Internet control.”[25]

Consider first the means the government has to induce the spread of “digital IDs.” I will then describe more what these “digital IDs” would have to be.

First, government means:

• Sites on the Net have the ability to condition access based on whether someone carries the proper credential. The government has the power to require sites to impose this condition. For example, the state could require that gambling sites check the age and residency of anyone trying to use the site. Many sites could be required to check the citizenship of potential users, or any number of other credentials. As more and more sites complied with this requirement, individuals would have a greater and greater incentive to carry the proper credentials. The more credentials they carried, the easier it would be to impose regulations on them.[26]

• The government could give a tax break to anyone who filed his or her income tax with a proper credential.

• The government could impose a 10 percent Internet sales tax and then exempt anyone who purchased goods with a certificate that authenticated their state of residence; the state would then be able to collect whatever local tax applied when it was informed of the purchase. [27]

• The government could charge users for government publications unless they gained access to the site with a properly authenticated certificate.

• As in other Western democracies, the government could mandate voting[28] — and then establish Internet voting; voters would come to the virtual polls with a digital identity that certified them as registered.

• The government could make credit card companies liable for the full cost of any credit card or debit card online fraud whenever the transaction was processed without a qualified ID.

• The government could require the establishment of a secure registry of e-mail servers that would be used to fight spam. That list would encourage others to begin to require some further level of authentication before sending e-mail. That authentication could be supplied by a digital ID.

The effect of each of these strategies would be to increase the prevalence of digital IDs. And at some point, there would be a tipping. There is an obvious benefit to many on the Net to be able to increase confidence about the entity with whom they are dealing. These digital IDs would be a tool to increase that confidence. Thus, even if a site permits itself to be accessed without any certification by the user, any step beyond that initial contact could require carrying the proper ID. The norm would be to travel in cyberspace with an ID; those who refuse would find the cyberspace that they could inhabit radically reduced.

The consequence of this tipping would be to effectively stamp every action on the Internet — at a minimum — with a kind of digital fingerprint. That fingerprint — at a minimum — would enable authorities to trace any action back to the party responsible for it. That tracing — at a minimum — could require judicial oversight before any trace could be effected. And that oversight — at a minimum — could track the ordinary requirements of the Fourth Amendment.

At a minimum. For the critical part in this story is not that the government could induce an ID-rich Internet. Obviously it could. Instead, the important question is the kind of ID-rich Internet the government induces.

Compare two very different sorts of digital IDs, both of which we can understand in terms of the “wallet” metaphor used in Chapter 4 to describe the evolving technology of identity that Microsoft is helping to lead.

One sort of ID would work like this: Every time you need to identify yourself, you turn over your wallet. The party demanding identification rummages through the wallet, gathering whatever data he wants.

The second sort of ID works along the lines of the Identity Layer described in Chapter 4: When you need to identify yourself, you can provide the minimal identification necessary. So if you need to certify that you’re an American, only that bit gets revealed. Or if you need to certify that you’re over 18, only that fact gets revealed.

On the model of the second form of the digital ID, it becomes possible to imagine then an ultra-minimal ID — an identification that reveals nothing on its face, but facilitates traceability. Again, a kind of digital fingerprint which is meaningless unless decoded, and, once decoded, links back to a responsible agent.

These two architectures stand at opposite ends of a spectrum. They produce radically different consequences for privacy and anonymity. Perfect anonymity is possible with neither; the minimal effect of both is to make behavior traceable. But with the second mode, that traceability itself can be heavily regulated. Thus, there should be no possible traceability when the only action at issue is protected speech. And where a trace is to be permitted, it should only be permitted if authorized by proper judicial action. Thus the system would preserve the capacity to identify who did what when, but it would only realize that capacity under authorized circumstances.

The difference between these two ID-enabled worlds, then, is all the difference in the world. And critically, which world we get depends completely upon the values that guide the development of this architecture. ID-type 1 would be a disaster for privacy as well as security. ID-type 2 could radically increase privacy, as well as security, for all except those whose behavior can legitimately be tracked.

Now, the feasibility of the government effecting either ID depends crucially upon the target of regulation. It depends upon there being an entity responsible for the code that individuals use, and it requires that these entities can be effectively regulated. Is this assumption really true? The government may be able to regulate the telephone companies, but can it regulate a diversity of code writers? In particular, can it regulate code writers who are committed to resisting precisely such regulation?

In a world where the code writers were the sort of people who governed the Internet Engineering Task Force[29] of a few years ago, the answer is probably no. The underpaid heroes who built the Net have ideological reasons to resist government’s mandate. They were not likely to yield to its threats. Thus, they would provide an important check on the government’s power over the architectures of cyberspace.

But as code writing becomes commercial — as it becomes the product of a smaller number of large companies — the government’s ability to regulate it increases. The more money there is at stake, the less inclined businesses (and their backers) are to bear the costs of promoting an ideology.

The best example is the history of encryption. From the very start of the debate over the government’s control of encryption, techies have argued that such regulations are silly. Code can always be exported; bits know no borders. So the idea that a law of Congress would control the flow of code was, these people argued, absurd.

The fact is, however, that the regulations had a substantial effect. Not on the techies — who could easily get encryption technologies from any number of places on the Net — but on the businesses writing software that would incorporate such technology. Netscape or IBM was not about to build and sell software in violation of U.S. regulations. The United States has a fairly powerful threat against these two companies. As the techies predicted, regulation did not control the flow of bits. But it did quite substantially inhibit the development of software that would use these bits.[30]

The effect has been profound. Companies that were once bastions of unregulability are now becoming producers of technologies that facilitate regulation. For example, Network Associates, inheritor of the encryption program PGP, was originally a strong opponent of regulation of encryption; now it offers products that facilitate corporate control of encryption and recovery of keys.[31] Key recovery creates a corporate back door, which, in many contexts, is far less restricted than a governmental back door.

Cisco is a second example.[32] In 1998 Cisco announced a router product that would enable an ISP to encrypt Internet traffic at the link level — between gateways, that is.[33] But this router would also have a switch that would disable the encryption of the router data and facilitate the collection of unencrypted Internet traffic. This switch could be flipped at the government’s command; in other words, the data would be encrypted only when the government allowed it to be.

The point in both cases is that the government is a player in the market for software. It affects the market both by creating rules and by purchasing products. Either way, it influences the supply of commercial software providers who exist to provide what the market demands.

Veterans of the early days of the Net might ask these suppliers, “How could you?”

“It’s just business”, is the obvious reply.

East Coast and West Coast Codes

Throughout this section, I’ve been speaking of two sorts of code. One is the “code” that Congress enacts (as in the tax code or “the U.S. Code”). Congress passes an endless array of statutes that say in words how to behave. Some statutes direct people; others direct companies; some direct bureaucrats. The technique is as old as government itself: using commands to control. In our country, it is a primarily East Coast (Washington, D.C.) activity. Call it “East Coast Code.”

The other is the code that code writers “enact” — the instructions imbedded in the software and hardware that make cyberspace work. This is code in its modern sense. It regulates in the ways I’ve begun to describe. The code of Net95, for example, regulated to disable centralized control; code that encrypts regulates to protect privacy. In our country (MIT excepted), this kind of code writing is increasingly a West Coast (Silicon Valley, Redmond) activity. We can call it “West Coast Code.”

West Coast and East Coast Code can get along perfectly when they’re not paying much attention to each other. Each, that is, can regulate within its own domain. But the story of this chapter is “When East Meets West”: what happens when East Coast Code recognizes how West Coast Code affects regulability, and when East Coast Code sees how it might interact with West Coast Code to induce it to regulate differently.

This interaction has changed. The power of East Coast Code over West Coast Code has increased. When software was the product of hackers and individuals located outside of any institution of effective control (for example, the University of Illinois or MIT), East Coast Code could do little to control West Coast Code.[34] But as code has become the product of companies, the power of East Coast Code has increased. When commerce writes code, then code can be controlled, because commercial entities can be controlled. Thus, the power of East over West increases as West Coast Code becomes increasingly commercial.

There is a long history of power moving west. It tells of the clash of ways between the old and the new. The pattern is familiar. The East reaches out to control the West; the West resists. But that resistance is never complete. Values from the East become integrated with the West. The new takes on a bit of the old.

That is precisely what is happening on the Internet. When West Coast Code was born, there was little in its DNA that cared at all about East Coast Code concerns. The Internet’s aim was end-to-end communication. Regulation at the middle was simply disabled.

Over time, the concerns of East Coast Coders have become much more salient. Everyone hates the pathologies of the Internet — viruses, ID theft, and spam, to pick the least controversial. That universal hatred has warmed West Coast Coders to finding a remedy. They are now primed for the influence East Coast Code requires: adding complements to the Internet architecture that will bring regulability to the Net.

Now, some will continue to resist my claim that the government can effect a regulable Net. This resistance has a common form: Even if architectures of identification emerge, and even if they become common, there is nothing to show that they will become universal, and nothing to show that at any one time they could not be evaded. Individuals can always work around these technologies of identity. No control that they could effect would ever be perfect.

True. The control of an ID-rich Internet would never be complete. There will always be ways to escape.

But there is an important fallacy lurking in the argument: Just because perfect control is not possible does not mean that effective control is not possible. Locks can be picked, but that does not mean locks are useless. In the context of the Internet, even partial control would have powerful effects.

A fundamental principle of bovinity is operating here and elsewhere. Tiny controls, consistently enforced, are enough to direct very large animals. The controls of a certificate-rich Internet are tiny, I agree. But we are large animals. I think it is as likely that the majority of people would resist these small but efficient regulators of the Net as it is that cows would resist wire fences. This is who we are, and this is why these regulations work.

So imagine the world in which we all could simply establish our credentials simply by looking into a camera or swiping our finger on a thumbprint reader. In a second, without easily forgotten passwords, or easily forged authentication, we get access to the Net, with all of the attributes that are ours, reliably and simply assertable.

What will happen then? When you can choose between remembering a pass-phrase, typing it every time you want access to your computer, and simply using your thumb to authenticate who you are? Or if not your thumb, then your iris, or whatever body part turns out to be cheapest to certify? When it is easiest simply to give identity up, will anyone resist?

If this is selling your soul, then trust that there are truly wonderful benefits to be had. Imagine a world where all your documents exist on the Internet in a “virtual private network”, accessible by you from any machine on the Net and perfectly secured by a biometric key.[35] You could sit at any machine, call up your documents, do your work, answer your e-mail, and move on — everything perfectly secure and safe, locked up by a key certified by the markings in your eye.

This is the easiest and most efficient architecture to imagine. And it comes at (what some think) is a very low price — authentication. Just say who you are, plug into an architecture that certifies facts about you, give your identity away, and all this could be yours.

Z-Theory

“So, like, it didn’t happen, Lessig. You said in 1999 that commerce and government would work together to build the perfectly regulable net. As I look through my spam-infested inbox, while my virus checker runs in the background, I wonder what you think now. Whatever was possible hasn’t happened. Doesn’t that show that you’re wrong?”

So writes a friend to me as I began this project to update Code v1. And while I never actually said anything about when the change I was predicting would happen, there is something in the criticism. The theory of Code v1 is missing a part: Whatever incentives there are to push in small ways to the perfectly regulable Net, the theory doesn’t explain what would motivate the final push. What gets us over the tipping point?

The answer is not fully written, but its introduction was published this year. In May 2006, the Harvard Law Review gave Professor Jonathan Zittrain (hence “Z-theory”) 67 pages to explain “The Generative Internet.”[36] The article is brilliant; the book will be even better; and the argument is the missing piece in Code v1.

Much of The Generative Internet will be familiar to readers of this book. General-purpose computers plus an end-to-end network, Zittrain argues, have produced an extraordinarily innovative ( “generative”) platform for invention. We celebrate the good stuff this platform has produced. But we (I especially) who so celebrate don’t pay enough attention to the bad. For the very same design that makes it possible for an Indian immigrant to invent HoTMaiL, or Stanford dropouts to create Google, also makes it possible for malcontents and worse to create viruses and worse. These sorts use the generative Internet to generate evil. And as Zittrain rightly observes, we’ve just begun to see the evil this malware will produce. Consider just a few of his examples:

• In 2003, in a test designed to measure the sophistication of spammers in finding “open relay” servers through which they could send their spam undetected, within 10 hours spammers had found the server. Within 66 hours they had sent more than 3.3 million messages to 229,468 people.[37]

• In 2004, the Sasser worm was able to compromise more than 500,000 computers — in just 3 days.[38] The year before, the Slammer worm infected 90 percent of a particular Microsoft server — in just 15 minutes.[39]

• In 2003, the SoBig.F e-mail virus accounted for almost 70 percent of the e-mails sent while it was spreading. More than 23.2 million messages were sent to AOL users alone.[40]

These are of course not isolated events. They are instead part of a growing pattern. As the U.S. Computer Emergency Readiness Team calculates, there has been an explosion of security incidents reported to CERT. Here is the graph Zittrain produced from the data:[41]

The graph ends in 2004 because CERT concluded that the incidents were so “commonplace and widespread as to be indistinguishable from one another.”[42]

That there is malware on the Internet isn’t surprising. That it is growing isn’t surprising either. What is surprising is that, so far at least, this malware has not been as destructive as it could be. Given the ability of malware authors to get their malicious code on many machines very quickly, why haven’t more tried to do real harm?

For example, imagine a worm that worked itself onto a million machines, and in a synchronized attack, simultaneously deleted the hard drive of all million machines. Zittrain’s point is not that this is easy, but rather, that it is just as difficult as the kind of worms that are already successfully spreading themselves everywhere. So why doesn’t one of the malicious code writers do real damage? What’s stopping cyber-Armageddon?

The answer is that there’s no good answer. And when there’s no good explanation for why something hasn’t happened yet, there’s good reason to worry that it will happen. And when this happens — when a malware author produces a really devastatingly destructive worm — that will trigger the political resolve to do what so far governments have not done: push to complete the work of transforming the Net into a regulable space.

This is the crucial (and once you see it, obvious) insight of Z-theory. Terror motivates radical change. Think about, for example, the changes in law enforcement (and the protection of civil rights) effected by the “Patriot Act.”[43] This massively extensive piece of legislation was enacted 45 days after the terror attacks on 9/11. But most of that bill had been written long before 9/11. The authors knew that until there was a serious terrorist attack, there would be insufficient political will to change law enforcement significantly. But once the trigger of 9/11 was pulled, radical change was possible.

The same will be true of the Internet. The malware we’ve seen so far has caused great damage. We’ve suffered this damage as annoyance rather than threat. But when the Internet’s equivalent of 9/11 happens — whether sponsored by “terrorists” or not — annoyance will mature into political will. And that political will will produce real change.

Zittrain’s aim is to prepare us for that change. His powerful and extensive analysis works through the trade-offs we could make as we change the Internet into something less generative. And while his analysis is worthy of a book of its own, I’ll let him write it. My goal in pointing to it here is to provide an outline to an answer that plugs the hole in the theory of Code v1. Code v1 described the means. Z-theory provides the motive.

There was an awful movie released in 1996 called Independence Day. The story is about an invasion by aliens. When the aliens first appear, many earthlings are eager to welcome them. For these idealists, there is no reason to assume hostility, and so a general joy spreads among the hopeful across the globe in reaction to what before had seemed just a dream: really cool alien life.

Soon after the aliens appear, however, and well into the celebration, the mood changes. Quite suddenly, Earth’s leaders realize that the intentions of these aliens are not at all friendly. Indeed, they are quite hostile. Within a very short time of this realization, Earth is captured. (Only Jeff Goldblum realizes what’s going on beforehand, but he always gets it first.)

My story here is similar (though I hope not as awful). We have been as welcoming and joyous about the Net as the earthlings were about the aliens in Independence Day; we have accepted its growth in our lives without questioning its final effect. But at some point, we too will come to see a potential threat. We will see that cyberspace does not guarantee its own freedom but instead carries an extraordinary potential for control. And then we will ask: How should we respond?

I have spent many pages making a point that some may find obvious. But I have found that, for some reason, the people for whom this point should be most important do not get it. Too many take this freedom as nature. Too many believe liberty will take care of itself. Too many miss how different architectures embed different values, and that only by selecting these different architectures — these different codes — can we establish and promote our values.

Now it should be apparent why I began this book with an account of the rediscovery of the role for self-government, or control, that has marked recent history in post-Communist Europe. Market forces encourage architectures of identity to facilitate online commerce. Government needs to do very little — indeed, nothing at all — to induce just this sort of development. The market forces are too powerful; the potential here is too great. If anything is certain, it is that an architecture of identity will develop on the Net — and thereby fundamentally transform its regulability.

But isn’t it clear that government should do something to make this architecture consistent with important public values? If commerce is going to define the emerging architectures of cyberspace, isn’t the role of government to ensure that those public values that are not in commerce’s interest are also built into the architecture?

Architecture is a kind of law: It determines what people can and cannot do. When commercial interests determine the architecture, they create a kind of privatized law. I am not against private enterprise; my strong presumption in most cases is to let the market produce. But isn’t it absolutely clear that there must be limits to this presumption? That public values are not exhausted by the sum of what IBM might desire? That what is good for America Online is not necessarily good for America?

Ordinarily, when we describe competing collections of values, and the choices we make among them, we call these choices “political.” They are choices about how the world will be ordered and about which values will be given precedence.

Choices among values, choices about regulation, about control, choices about the definition of spaces of freedom — all this is the stuff of politics. Code codifies values, and yet, oddly, most people speak as if code were just a question of engineering. Or as if code is best left to the market. Or best left unaddressed by government.

But these attitudes are mistaken. Politics is that process by which we collectively decide how we should live. That is not to say it is a space where we collectivize — a collective can choose a libertarian form of government. The point is not the substance of the choice. The point about politics is process. Politics is the process by which we reason about how things ought to be.

Two decades ago, in a powerful trilogy drawing together a movement in legal theory, Roberto Unger preached that “it’s all politics.”[44] He meant that we should not accept that any part of what defines the world is removed from politics — everything should be considered “up for grabs” and subject to reform.

Many believed Unger was arguing that we should put everything up for grabs all the time, that nothing should be certain or fixed, that everything should be in constant flux. But that is not what he meant.

His meaning was instead just this: That we should interrogate the necessities of any particular social order and ask whether they are in fact necessities, and we should demand that those necessities justify the powers that they order. As Bruce Ackerman puts it, we must ask of every exercise of power: Why?[45] Perhaps not exactly at the moment when the power is exercised, but sometime.

“Power”, in this account, is just another word for constraints that humans can do something about. Meteors crashing to earth are not “power” within the domain of “it’s all politics.” Where the meteor hits is not politics, though the consequences may well be. Where it hits, instead, is nothing we can do anything about.

But the architecture of cyberspace is power in this sense; how it is could be different. Politics is about how we decide, how that power is exercised, and by whom.

If code is law, then, as William Mitchell writes, “control of code is power”: “For citizens of cyberspace, . . . code . . . is becoming a crucial focus of political contest. Who shall write that software that increasingly structures our daily lives? ”[46] As the world is now, code writers are increasingly lawmakers. They determine what the defaults of the Internet will be; whether privacy will be protected; the degree to which anonymity will be allowed; the extent to which access will be guaranteed. They are the ones who set its nature. Their decisions, now made in the interstices of how the Net is coded, define what the Net is.

How the code regulates, who the code writers are, and who controls the code writers — these are questions on which any practice of justice must focus in the age of cyberspace. The answers reveal how cyberspace is regulated. My claim in this part of the book is that cyberspace is regulated by its code, and that the code is changing. Its regulation is its code, and its code is changing.

We are entering an age when the power of regulation will be relocated to a structure whose properties and possibilities are fundamentally different. As I said about Russia at the start of this book, one form of power may be destroyed, but another is taking its place.

Our aim must be to understand this power and to ask whether it is properly exercised. As David Brin asks, “If we admire the Net, should not a burden of proof fall on those who would change the basic assumptions that brought it about in the first place? ”[47]

These “basic assumptions” were grounded in liberty and openness. An invisible hand now threatens both. We need to understand how.

One example of the developing struggle over cyber freedoms is the still-not-free China. The Chinese government has taken an increasingly aggressive stand against behavior in cyberspace that violates real-space norms. Purveyors of porn get 10 years in jail. Critics of the government get the same. If this is the people’s republic, this is the people’s tough love.

To make these prosecutions possible, the Chinese need the help of network providers. And local law requires that network providers in China help. So story after story now reports major network providers — including Yahoo! and Microsoft — helping the government do the sort of stuff that would make our Constitution cringe.

The extremes are bad enough. But the more revealing example of the pattern I’m describing here is Google. Google is (rightly) famous for its fantastic search engine. Its brand has been built on the idea that no irrelevant factor controls its search results. Companies can buy search words, but their results are bracketed and separate from the main search results. The central search results — that part of the screen your eyes instinctively go to — are not to be tampered with.

Unless the company seeking to tamper with the results is China, Inc. For China, Google has promised to build a special routine.[48] Sites China wants to block won’t appear in the Google.CN search engine. No notice will be presented. No system will inform searchers that the search results they are reading have been filtered by Chinese censors. Instead, to the Chinese viewer, this will look like normal old Google. And because Google is so great, the Chinese government knows most will be driven to Google, even if Google filters what the government doesn’t want its people to have.

Here is the perfect dance of commerce with government. Google can build the technology the Chinese need to make China’s regulation more perfectly enabled, and China can extract that talent from Google by mandating it as a condition of being in China’s market.

The value of that market is thus worth more to Google than the value of its “neutral search” principle. Or at least, it better be, if this deal makes any sense.

My purpose here is not to criticize Google — or Microsoft, or Yahoo! These companies have stockholders; maximizing corporate value is their charge. Were I running any of these companies, I’m not sure I would have acted differently.

But that in the end is my point: Commerce has a purpose, and government can exploit that to its own end. It will, increasingly and more frequently, and when it does, the character of the Net will change.

Radically so.