Harnessing the Power of Disruptive TechnologiesEdited by Andy Oram
0-596-00110-X, Order Number: 110X
448 pages, $29.95
A Network of Peers
Peer-to-Peer Models Through the History of the Internet
Nelson Minar and Marc Hedlund, Popular Power
The Internet is a shared resource, a cooperative network built out of millions of hosts all over the world. Today there are more applications than ever that want to use the network, consume bandwidth, and send packets far and wide. Since 1994, the general public has been racing to join the community of computers on the Internet, placing strain on the most basic of resources: network bandwidth. And the increasing reliance on the Internet for critical applications has brought with it new security requirements, resulting in firewalls that strongly partition the Net into pieces. Through rain and snow and congested Network Access Providers (NAPs), the email goes through, and the system has scaled vastly beyond its original design.
In the year 2000, though, something has changed--or, perhaps, reverted. The network model that survived the enormous growth of the previous five years has been turned on its head. What was down has become up; what was passive is now active. Through the music-sharing application called Napster, and the larger movement dubbed "peer-to-peer," the millions of users connecting to the Internet have started using their ever more powerful home computers for more than just browsing the Web and trading email. Instead, machines in the home and on the desktop are connecting to each other directly, forming groups and collaborating to become user-created search engines, virtual supercomputers, and filesystems.
Not everyone thinks this is such a great idea. Some objections (dealt with elsewhere in this volume) cite legal or moral concerns. Other problems are technical. Many network providers, having set up their systems with the idea that users would spend most of their time downloading data from central servers, have economic objections to peer-to-peer models. Some have begun to cut off access to peer-to-peer services on the basis that they violate user agreements and consume too much bandwidth (for illicit purposes, at that). As reported by the online News.com site, a third of U.S. colleges surveyed have banned Napster because students using it have sometimes saturated campus networks.
In our own company, Popular Power, we have encountered many of these problems as we create a peer-to-peer distributed computing resource out of millions of computers all over the Internet. We have identified many specific problems where the Internet architecture has been strained; we have also found work-arounds for many of these problems and have come to understand what true solutions would be like. Surprisingly, we often find ourselves looking back to the Internet of 10 or 15 years ago to consider how best to solve a problem.
The original Internet was fundamentally designed as a peer-to-peer system. Over time it has become increasingly client/server, with millions of consumer clients communicating with a relatively privileged set of servers. The current crop of peer-to-peer applications is using the Internet much as it was originally designed: as a medium for communication for machines that share resources with each other as equals. Because this network model is more revolutionary for its scale and its particular implementations than for its concept, a good number of past Internet applications can provide lessons to architects of new peer-to-peer applications. In some cases, designers of current applications can learn from distributed Internet systems like Usenet and the Domain Name System (DNS); in others, the changes that the Internet has undergone during its commercialization may need to be reversed or modified to accommodate new peer-to-peer applications. In either case, the lessons these systems provide are instructive, and may help us, as application designers, avoid causing the death of the Internet.
A revisionist history of peer-to-peer (1969-1995)
The Internet as originally conceived in the late 1960s was a peer-to-peer system. The goal of the original ARPANET was to share computing resources around the U.S. The challenge for this effort was to integrate different kinds of existing networks as well as future technologies with one common network architecture that would allow every host to be an equal player. The first few hosts on the ARPANET--UCLA, SRI, UCSB, and the University of Utah--were already independent computing sites with equal status. The ARPANET connected them together not in a master/slave or client/server relationship, but rather as equal computing peers.
The early Internet was also much more open and free than today's network. Firewalls were unknown until the late 1980s. Generally, any two machines on the Internet could send packets to each other. The Net was the playground of cooperative researchers who generally did not need protection from each other. The protocols and systems were obscure and specialized enough that security break-ins were rare and generally harmless. As we shall see later, the modern Internet is much more partitioned.
The early "killer apps" of the Internet, FTP and Telnet, were themselves client/server applications. A Telnet client logged into a compute server, and an FTP client sent and received files from a file server. But while a single application was client/server, the usage patterns as a whole were symmetric. Every host on the Net could FTP or Telnet to any other host, and in the early days of minicomputers and mainframes, the servers usually acted as clients as well.
This fundamental symmetry is what made the Internet so radical. In turn, it enabled a variety of more complex systems such as Usenet and DNS that used peer-to-peer communication patterns in an interesting fashion. In subsequent years, the Internet has become more and more restricted to client/server-type applications. But as peer-to-peer applications become common again, we believe the Internet must revert to its initial design.
Let's look at two long-established fixtures of computer networking that include important peer-to-peer components: Usenet and DNS.
Usenet news implements a decentralized model of control that in some ways is the grandfather of today's new peer-to-peer applications such as Gnutella and Freenet. Fundamentally, Usenet is a system that, using no central control, copies files between computers. Since Usenet has been around since 1979, it offers a number of lessons and is worth considering for contemporary file-sharing applications.
The Usenet system was originally based on a facility called the Unix-to-Unix-copy protocol, or UUCP. UUCP was a mechanism by which one Unix machine would automatically dial another, exchange files with it, and disconnect. This mechanism allowed Unix sites to exchange email, files, system patches, or other messages. The Usenet used UUCP to exchange messages within a set of topics, so that students at the University of North Carolina and Duke University could each "post" messages to a topic, read messages from others on the same topic, and trade messages between the two schools. The Usenet grew from these original two hosts to hundreds of thousands of sites. As the network grew, so did the number and structure of the topics in which a message could be posted. Usenet today uses a TCP/IP-based protocol known as the Network News Transport Protocol (NNTP), which allows two machines on the Usenet network to discover new newsgroups efficiently and exchange new messages in each group.
The basic model of Usenet provides a great deal of local control and relatively simple administration. A Usenet site joins the rest of the world by setting up a news exchange connection with at least one other news server on the Usenet network. Today, exchange is typically provided by a company's ISP. The administrator tells the company's news server to get in touch with the ISP's news server and exchange messages on a regular schedule. Company employees contact the company's local news server, and transact with it to read and post news messages. When a user in the company posts a new message in a newsgroup, the next time the company news server contacts the ISP's server it will notify the ISP's server that it has a new article and then transmit that article. At the same time, the ISP's server sends its new articles to the company's server.
Today, the volume of Usenet traffic is enormous, and not every server will want to carry the full complement of newsgroups or messages. The company administrator can control the size of the news installation by specifying which newsgroups the server will carry. In addition, the administrator can specify an expiration time by group or hierarchy, so that articles in a newsgroup will be retained for that time period but no longer. These controls allow each organization to voluntarily join the network on its own terms. Many organizations decide not to carry newsgroups that transmit sexually oriented or illegal material. This is a distinct difference from, say, Freenet, which (as a design choice) does not let a user know what material he or she has received.
Usenet has evolved some of the best examples of decentralized control structures on the Net. There is no central authority that controls the news system. The addition of new newsgroups to the main topic hierarchy is controlled by a rigorous democratic process, using the Usenet group news.admin to propose and discuss the creation of new groups. After a new group is proposed and discussed for a set period of time, anyone with an email address may submit an email vote for or against the proposal. If a newsgroup vote passes, a new group message is sent and propagated through the Usenet network.
There is even an institutionalized form of anarchy, the alt.* hierarchy, that subverts the news.admin process in a codified way. An alt newsgroup can be added at any time by anybody, but sites that don't want to deal with the resulting absurdity can avoid the whole hierarchy. The beauty of Usenet is that each of the participating hosts can set their own local policies, but the network as a whole functions through the cooperation and good will of the community. Many of the peer-to-peer systems currently emerging have not yet effectively addressed decentralized control as a goal. Others, such as Freenet, deliberately avoid giving local administrators control over the content of their machines because this control would weaken the political aims of the system. In each case, the interesting question is: how much control can or should the local administrator have?
NNTP as a protocol contains a number of optimizations that modern peer-to-peer systems would do well to copy. For instance, news messages maintain a "Path" header that traces their transmission from one news server to another. If news server A receives a request from server B, and A's copy of a message lists B in the Path header, A will not try to retransmit that message to B. Since the purpose of NNTP transmission is to make sure every news server on Usenet can receive an article (if it wants to), the Path header avoids a flood of repeated messages. Gnutella, as an example, does not use a similar system when transmitting search requests, so as a result a single Gnutella node can receive the same request repeatedly.
The open, decentralized nature of Usenet can be harmful as well as beneficial. Usenet has been enormously successful as a system in the sense that it has survived since 1979 and continues to be home to thriving communities of experts. It has swelled far beyond its modest beginnings. But in many ways the trusting, decentralized nature of the protocol has reduced its utility and made it an extremely noisy communication channel. Particularly, as we will discuss later, Usenet fell victim to spam early in the rise of the commercial Internet. Still, Usenet's systems for decentralized control, its methods of avoiding a network flood, and other characteristics make it an excellent object lesson for designers of peer-to-peer systems.
The Domain Name System (DNS) is an example of a system that blends peer-to-peer networking with a hierarchical model of information ownership. The remarkable thing about DNS is how well it has scaled, from the few thousand hosts it was originally designed to support in 1983 to the hundreds of millions of hosts currently on the Internet. The lessons from DNS are directly applicable to contemporary peer-to-peer data sharing applications.
DNS was established as a solution to a file-sharing problem. In the early days of the Internet, the way to map a human-friendly name like bbn to an IP address like 220.127.116.11 was through a single flat file, hosts.txt, which was copied around the Internet periodically. As the Net grew to thousands of hosts and managing that file became impossible, DNS was developed as a way to distribute the data sharing across the peer-to-peer Internet.
The namespace of DNS names is naturally hierarchical. For example, O'Reilly & Associates, Inc. owns the namespace oreilly.com: they are the sole authority for all names in their domain, such as www.oreilly.com. This built-in hierarchy yields a simple, natural way to delegate responsibility for serving part of the DNS database. Each domain has an authority, the name server of record for hosts in that domain. When a host on the Internet wants to know the address of a given name, it queries its nearest name server to ask for the address. If that server does not know the name, it delegates the query to the authority for that namespace. That query, in turn, may be delegated to a higher authority, all the way up to the root name servers for the Internet as a whole. As the answer propagates back down to the requestor, the result is cached along the way to the name servers so the next fetch can be more efficient. Name servers operate both as clients and as servers.
DNS as a whole works amazingly well, having scaled to 10,000 times its original size. There are several key design elements in DNS that are replicated in many distributed systems today. One element is that hosts can operate both as clients and as servers, propagating requests when need be. These hosts help make the network scale well by caching replies. The second element is a natural method of propagating data requests across the network. Any DNS server can query any other, but in normal operation there is a standard path up the chain of authority. The load is naturally distributed across the DNS network, so that any individual name server needs to serve only the needs of its clients and the namespace it individually manages.
So from its earliest stages, the Internet was built out of peer-to-peer communication patterns. One advantage of this history is that we have experience to draw from in how to design new peer-to-peer systems. The problems faced today by new peer-to-peer applications systems such as file sharing are quite similar to the problems that Usenet and DNS addressed 10 or 15 years ago.
The network model of the Internet explosion (1995-1999)
The explosion of the Internet in 1994 radically changed the shape of the Internet, turning it from a quiet geek utopia into a bustling mass medium. Millions of new people flocked to the Net. This wave represented a new kind of people--ordinary folks who were interested in the Internet as a way to send email, view web pages, and buy things, not computer scientists interested in the details of complex computer networks. The change of the Internet to a mass cultural phenomenon has had a far-reaching impact on the network architecture, an impact that directly affects our ability to create peer-to-peer applications in today's Internet. These changes are seen in the way we use the network, the breakdown of cooperation on the Net, the increasing deployment of firewalls on the Net, and the growth of asymmetric network links such as ADSL and cable modems.
The switch to client/server
The network model of user applications--not just their consumption of bandwidth, but also their methods of addressing and communicating with other machines--changed significantly with the rise of the commercial Internet and the advent of millions of home users in the 1990s. Modem connection protocols such as SLIP and PPP became more common, typical applications targeted slow-speed analog modems, and corporations began to manage their networks with firewalls and Network Address Translation (NAT). Many of these changes were built around the usage patterns common at the time, most of which involved downloading data, not publishing or uploading information.
The web browser, and many of the other applications that sprung up during the early commercialization of the Internet, were based around a simple client/server protocol: the client initiates a connection to a well-known server, downloads some data, and disconnects. When the user is finished with the data retrieved, the process is repeated. The model is simple and straightforward. It works for everything from browsing the Web to watching streaming video, and developers cram shopping carts, stock transactions, interactive games, and a host of other things into it. The machine running a web client doesn't need to have a permanent or well-known address. It doesn't need a continuous connection to the Internet. It doesn't need to accommodate multiple users. It just needs to know how to ask a question and listen for a response.
Not all of the applications used at home fit this model. Email, for instance, requires much more two-way communication between an email client and server. In these cases, though, the client is often talking to a server on the local network (either the ISP's mail server or a corporate one). Chat systems that achieved widespread usage, such as AOL's Instant Messenger, have similar "local" properties, and Usenet systems do as well. As a result, the typical ISP configuration instructions give detailed (and often misunderstood) instructions for email, news, and sometimes chat. These were the exceptions that were worth some manual configuration on the user's part. The "download" model is simpler and works without much configuration; the "two-way" model is used less frequently but perhaps to greater effect.
While early visions of the Web always called it a great equalizer of communications--a system that allowed every user to publish their viewpoints rather than simply consume media--the commercial explosion on the Internet quickly fit the majority of traffic into the downstream paradigm already used by television and newspapers. Architects of the systems that enabled the commercial expansion of the Net often took this model into account, assuming that it was here to stay. Peer-to-peer applications may require these systems to change.
The breakdown of cooperation
The early Internet was designed on principles of cooperation and good engineering. Everyone working on Internet design had the same goal: build a reliable, efficient, powerful network. As the Internet entered its current commercial phase, the incentive structures changed, resulting in a series of stresses that have highlighted the Internet's susceptibility to the tragedy of the commons. This phenomenon has shown itself in many ways, particularly the rise of spam on the Internet and the challenges of building efficient network protocols that correctly manage the common resource.
Spam: Uncooperative people
Spam, or unsolicited commercial messages, is now an everyday occurrence on the Internet. Back in the pre-commercial network, however, unsolicited advertisements were met with surprise and outrage. The end of innocence occurred on April 12, 1994, the day the infamous Canter and Seigel "green card spam" appeared on the Usenet. Their offense was an advertisement posted individually to every Usenet newsgroup, blanketing the whole world with a message advertising their services. At the time, this kind of action was unprecedented and engendered strong disapproval. Not only were most of the audience uninterested in the service, but many people felt that Canter and Seigel had stolen the Usenet's resources. The advertisers did not pay for the transmission of the advertisement; instead the costs were born by the Usenet as a whole.
In the contemporary Internet, spam does not seem surprising; Usenet has largely been given over to it, and ISPs now provide spam filtering services for their users' email both to help their users and in self-defense. Email and Usenet relied on individuals' cooperation to not flood the commons with junk mail, and that cooperation broke down. Today the Internet generally lacks effective technology to prevent spam.
The problem is the lack of accountability in the Internet architecture. Because any host can connect to any other host, and because connections are nearly anonymous, people can insert spam into the network at any point. There has been an arms race of trying to hold people accountable--closing down open sendmail relays, tracking sources of spam on Usenet, retaliation against spammers--but the battle has been lost, and today we have all learned to live with spam.
The lesson for peer-to-peer designers is that without accountability in a network, it is difficult to enforce rules of social responsibility. Just like Usenet and email, today's peer-to-peer systems run the risk of being overrun by unsolicited advertisements. It is difficult to design a system where socially inappropriate use is prevented. Technologies for accountability, such as cryptographic identification or reputation systems, can be valuable tools to help manage a peer-to-peer network. There have been proposals to retrofit these capabilities into Usenet and email, but none today are widespread; it is important to build these capabilities into the system from the beginning. Chapter 16, Accountability, discusses some techniques for controlling spam, but these are still arcane.
The TCP rate equation: Cooperative protocols
A fundamental design principle of the Internet is best effort packet delivery. "Best effort" means the Internet does not guarantee that a packet will get through, simply that the Net will do its best to get the packet to the destination. Higher-level protocols such as TCP create reliable connections by detecting when a packet gets lost and resending it. A major reason packets do not get delivered on the Internet is congestion: if a router in the network is overwhelmed, it will start dropping packets at random. TCP accounts for this by throttling the speed at which it sends data. When the network is congested, each individual TCP connection independently slows down, seeking to find the optimal rate while not losing too many packets. But not only do individual TCP connections optimize their bandwidth usage, TCP is also designed to make the Internet as a whole operate efficiently. The collective behavior of many individual TCP connections backing off independently results in a lessening of the congestion at the router, in a way that is exquisitely tuned to use the router's capacity efficiently. In essence, the TCP backoff algorithm is a way for individual peers to manage a shared resource without a central coordinator.
The problem is that the efficiency of TCP on the Internet scale fundamentally requires cooperation: each network user has to play by the same rules. The performance of an individual TCP connection is inversely proportional to the square root of the packet loss rate--part of the "TCP rate equation," a fundamental governing law of the Internet. Protocols that follow this law are known as "TCP-friendly protocols." It is possible to design other protocols that do not follow the TCP rate equation, ones that rudely try to consume more bandwidth than they should. Such protocols can wreak havoc on the Net, not only using more than their fair share but actually spoiling the common resource for all. This abstract networking problem is a classic example of a tragedy of the commons, and the Internet today is quite vulnerable to it.
The problem is not only theoretical, it is also quite practical. As protocols have been built in the past few years by companies with commercial demands, there has been growing concern that unfriendly protocols will begin to hurt the Internet.
An early example was a feature added by Netscape to their browser--the ability to download several files at the same time. The Netscape engineers discovered that if you downloaded embedded images in parallel, rather than one at a time, the whole page would load faster and users would be happier. But there was a question: was this usage of bandwidth fair? Not only does it tax the server to have to send out more images simultaneously, but it creates more TCP channels and sidesteps TCP's congestion algorithms. There was some controversy about this feature when Netscape first introduced it, a debate quelled only after Netscape released the client and people discovered in practice that the parallel download strategy did not unduly harm the Internet. Today this technique is standard in all browsers and goes unquestioned. The questions have reemerged at the new frontier of "download accelerator" programs that download different chunks of the same file simultaneously, again threatening to upset the delicate management of Internet congestion.
A more troubling concern about congestion management is the growth of bandwidth-hungry streaming broadband media. Typical streaming media applications do not use TCP, instead favoring custom UDP-based protocols with their own congestion control and failure handling strategies. Many of these protocols are proprietary; network engineers do not even have access to their implementations to examine if they are TCP-friendly. So far there has been no major problem. The streaming media vendors seem to be playing by the rules, and all is well. But fundamentally the system is brittle, and either through a mistake or through greed the Internet's current delicate cooperation could be toppled.
What do spam and the TCP rate algorithm have in common? They both demonstrate that the proper operation of the Internet is fragile and requires the cooperation of everyone involved. In the case of TCP, the system has mostly worked and the network has been preserved. In the case of spam, however, the battle has been lost and unsocial behavior is with us forever. The lesson for peer-to-peer system designers is to consider the issue of polite behavior up front. Either we must design systems that do not require cooperation to function correctly, or we must create incentives for cooperation by rewarding proper behavior or auditing usage so that misbehavior can be punished.
Firewalls, dynamic IP, NAT: The end of the open network
At the same time that the cooperative nature of the Internet was being threatened, network administrators implemented a variety of management measures that resulted in the Internet being a much less open network. In the early days of the Internet, all hosts were equal participants. The network was symmetric--if a host could reach the Net, everyone on the Net could reach that host. Every computer could equally be a client and a server. This capability began to erode in the mid-1990s with the deployment of firewalls, the rise of dynamic IP addresses, and the popularity of Network Address Translation (NAT).
As the Internet matured there came a need to secure the network, to protect individual hosts from unlimited access. By default, any host that can access the Internet can also be accessed on the Internet. Since average users could not handle the security risks that resulted from a symmetric design, network managers turned to firewalls as a tool to control access to their machines.
Firewalls stand at the gateway between the internal network and the Internet outside. They filter packets, choosing which traffic to let through and which to deny. A firewall changes the fundamental Internet model: some parts of the network cannot fully talk to other parts. Firewalls are a very useful security tool, but they pose a serious obstacle to peer-to-peer communication models.
A typical firewall works by allowing anyone inside the internal network to initiate a connection to anyone on the Internet, but it prevents random hosts on the Internet from initiating connections to hosts in the internal network. This kind of firewall is like a one-way gate: you can go out, but you cannot come in. A host protected in this way cannot easily function as a server; it can only be a client. In addition, outgoing connections may be restricted to certain applications like FTP and the Web by blocking traffic to certain ports at the firewall.
Allowing an Internet host to be only a client, not a server, is a theme that runs through a lot of the changes in the Internet after the consumer explosion. With the rise of modem users connecting to the Internet, the old practice of giving every Internet host a fixed IP address became impractical, because there were not enough IP addresses to go around. Dynamic IP address assignment is now the norm for many hosts on the Internet, where an individual computer's address may change every single day. Broadband providers are even finding dynamic IP useful for their "always on" services. The end result is that many hosts on the Internet are not easily reachable, because they keep moving around. Peer-to-peer applications such as instant messaging or file sharing have to work hard to circumvent this problem, building dynamic directories of hosts. In the early Internet, where hosts remained static, it was much simpler.
A final trend is to not even give a host a valid public Internet address at all, but instead to use NAT to hide the address of a host behind a firewall. NAT combines the problems of firewalls and dynamic IP addresses: not only is the host's true address unstable, it is not even reachable! All communication has to go through a fairly simple pattern that the NAT router can understand, resulting in a great loss of flexibility in applications communications. For example, many cooperative Internet games have trouble with NAT: every player in the game wants to be able to contact every other player, but the packets cannot get through the NAT router. The result is that a central server on the Internet has to act as an application-level message router, emulating the function that TCP/IP itself used to serve.
Firewalls, dynamic IP, and NAT grew out of a clear need in Internet architecture to make scalable, secure systems. They solved the problem of bringing millions of client computers onto the Internet quickly and manageably. But these same technologies have weakened the Internet infrastructure as a whole, relegating most computers to second-class status as clients only. New peer-to-peer applications challenge this architecture, demanding that participants serve resources as well as use them. As peer-to-peer applications become more common, there will be a need for common technical solutions to these problems.
A final Internet trend of the late 1990s that presents a challenge to peer-to-peer applications is the rise in asymmetric network connections such as ADSL and cable modems. In order to get the most efficiency out of available wiring, current broadband providers have chosen to provide asymmetric bandwidth. A typical ADSL or cable modem installation offers three to eight times more bandwidth when getting data from the Internet than when sending data to it, favoring client over server usage.
The reason this has been tolerated by most users is clear: the Web is the killer app for the Internet, and most users are only clients of the Web, not servers. Even users who publish their own web pages typically do not do so from a home broadband connection, but instead use third-party dedicated servers provided by companies like GeoCities or Exodus. In the early days of the Web it was not clear how this was going to work: could each user have a personal web server? But in the end most Web use is itself asymmetric--many clients, few servers--and most users are well served by asymmetric bandwidth.
The problem today is that peer-to-peer applications are changing the assumption that end users only want to download from the Internet, never upload to it. File-sharing applications such as Napster or Gnutella can reverse the bandwidth usage, making a machine serve many more files than it downloads. The upstream pipe cannot meet demand. Even worse, because of the details of TCP's rate control, if the upstream path is clogged, the downstream performance suffers as well. So if a computer is serving files on the slow side of a link, it cannot easily download simultaneously on the fast side.
ADSL and cable modems assume asymmetric bandwidth for an individual user. This assumption takes hold even more strongly inside ISP networks, which are engineered for bits to flow to the users, not from them. The end result is a network infrastructure that is optimized for computers that are only clients, not servers. But peer-to-peer technology generally makes every host act both as a client and a server; the asymmetric assumption is incorrect. There is not much an individual peer-to-peer application can do to work around asymmetric bandwidth; as peer-to-peer applications become more widespread, the network architecture is going to have to change to better handle the new traffic patterns.
Observations on the current crop of peer-to-peer applications (2000)
While the new breed of peer-to-peer applications can take lessons from earlier models, these applications also introduce new characteristics or features that are novel. Peer-to-peer allows us to separate the concepts of authoring information and publishing that same information. Peer-to-peer allows for decentralized application design, something that is both an opportunity and a challenge. And peer-to-peer applications place unique strains on firewalls, something well demonstrated by the current trend to use the HTTP port for operations other than web transactions.
Authoring is not the same as publishing
One of the promises of the Internet is that people are able to be their own publishers, for example, by using personal web sites to make their views and interests known. Self-publishing has certainly become more common with the commercialization of the Internet. More often, however, users spend most of their time reading (downloading) information and less time publishing, and as discussed previously, commercial providers of Internet access have structured their offering around this asymmetry.
The example of Napster creates an interesting middle ground between the ideal of "everyone publishes" and the seeming reality of "everyone consumes." Napster particularly (and famously) makes it very easy to publish data you did not author. In effect, your machine is being used as a repeater to retransmit data once it reaches you. A network designer, assuming that there are only so many authors in the world and therefore that asymmetric broadband is the perfect optimization, is confounded by this development. This is why many networks such as college campuses have banned Napster from use.
Napster changes the flow of data. The assumptions that servers would be owned by publishers and that publishers and authors would combine into a single network location have proven untrue. The same observation also applies to Gnutella, Freenet, and others. Users don't need to create content in order to want to publish it--in fact, the benefits of publication by the "reader" have been demonstrated by the scale some of these systems have been able to reach.
Peer-to-peer systems seem to go hand-in-hand with decentralized systems. In a fully decentralized system, not only is every host an equal participant, but there are no hosts with special facilitating or administrative roles. In practice, building fully decentralized systems can be difficult, and many peer-to-peer applications take hybrid approaches to solving problems. As we have already seen, DNS is peer-to-peer in protocol design but with a built-in sense of hierarchy. There are many other examples of systems that are peer-to-peer at the core and yet have some semi-centralized organization in application, such as Usenet, instant messaging, and Napster.
Usenet is an instructive example of the evolution of a decentralized system. Usenet propagation is symmetric: hosts share traffic. But because of the high cost of keeping a full news feed, in practice there is a backbone of hosts that carry all of the traffic and serve it to a large number of "leaf nodes" whose role is mostly to receive articles. Within Usenet, there was a natural trend toward making traffic propagation hierarchical, even though the underlying protocols do not demand it. This form of "soft centralization" may prove to be economic for many peer-to-peer systems with high-cost data transmission.
Many other current peer-to-peer applications present a decentralized face while relying on a central facilitator to coordinate operations. To a user of an instant messaging system, the application appears peer-to-peer, sending data directly to the friend being messaged. But all major instant messaging systems have some sort of server on the back end that facilitates nodes talking to each other. The server maintains an association between the user's name and his or her current IP address, buffers messages in case the user is offline, and routes messages to users behind firewalls. Some systems (such as ICQ) allow direct client-to-client communication when possible but have a server as a fallback. A fully decentralized approach to instant messaging would not work on today's Internet, but there are scaling advantages to allowing client-to-client communication when possible.
Napster is another example of a hybrid system. Napster's file sharing is decentralized: one Napster client downloads a file directly from another Napster client's machine. But the directory of files is centralized, with the Napster servers answering search queries and brokering client connections. This hybrid approach seems to scale well: the directory can be made efficient and uses low bandwidth, and the file sharing can happen on the edges of the network.
In practice, some applications might work better with a fully centralized design, not using any peer-to-peer technology at all. One example is a search on a large, relatively static database. Current web search engines are able to serve up to one billion pages all from a single place. Search algorithms have been highly optimized for centralized operation; there appears to be little benefit to spreading the search operation out on a peer-to-peer network (database generation, however, is another matter).
Also, applications that require centralized information sharing for accountability or correctness are hard to spread out on a decentralized network. For example, an auction site needs to guarantee that the best price wins; that can be difficult if the bidding process has been spread across many locations. Decentralization engenders a whole new area of network-related failures: unreliability, incorrect data synchronization, etc. Peer-to-peer designers need to balance the power of peer-to-peer models against the complications and limitations of decentralized systems.
Abusing port 80
One of the stranger phenomena in the current Internet is the abuse of port 80, the port that HTTP traffic uses when people browse the Web. Firewalls typically filter traffic based on the direction of traffic (incoming or outgoing) and the destination port of the traffic. Because the Web is a primary application of many Internet users, almost all firewalls allow outgoing connections on port 80 even if the firewall policy is otherwise very restrictive.
In the early days of the Internet, the port number usually indicated which application was using the network; the firewall could count on port 80 being only for Web traffic. But precisely because many firewalls allow connections to port 80, other application authors started routing traffic through that port. Streaming audio, instant messaging, remote method invocations, even whole mobile agents are being sent through port 80. Most current peer-to-peer applications have some way to use port 80 as well in order to circumvent network security policies. Naive firewalls are none the wiser; they are unaware that they are passing the exact sorts of traffic the network administrator intended to block.
The problem is twofold. First, there is no good way for a firewall to identify what applications are running through it. The port number has already been circumvented. Fancier firewalls can analyze the actual traffic going through the firewall and see if it is a legitimate HTTP stream, but that just encourages application designers to masquerade as HTTP, leading to an escalating arms race that benefits no one.
The second problem is that even if an application has a legitimate reason to go through the firewall, there is no simple way for the application to request permission. The firewall, as a network security measure, is outmoded. As long as a firewall allows some sort of traffic through, peer-to-peer applications will find a way to slip through that opening.
Peer-to-peer prescriptions (2001-?)
The story is clear: The Internet was designed with peer-to-peer applications in mind, but as it has grown the network has become more asymmetric. What can we do to permit new peer-to-peer applications to flourish while respecting the pressures that have shaped the Internet to date?
Technical solutions: Return to the old Internet
As we have seen, the explosion of the Internet into the consumer space brought with it changes that have made it difficult to do peer-to-peer networking. Firewalls make it hard to contact hosts; dynamic IP and NAT make it nearly impossible. Asymmetric bandwidth is holding users back from efficiently serving files on their systems. Current peer-to-peer applications generally would benefit from an Internet more like the original network, where these restrictions were not in place. How can we enable peer-to-peer applications to work better with the current technological situation?
Firewalls serve an important need: they allow administrators to express and enforce policies about the use of their networks. That need will not change with peer-to-peer applications. Neither application designers nor network security administrators are benefiting from the current state of affairs. The solution lies in making firewalls smarter so that peer-to-peer applications can cooperate with the firewall to allow traffic the administrator wants. Firewalls must become more sophisticated, allowing systems behind the firewall to ask permission to run a particular peer-to-peer application. Peer-to-peer designers must contribute to this design discussion, then enable their applications to use these mechanisms. There is a good start to this solution in the SOCKS protocol, but it needs to be expanded to be more flexible and more tied toward applications rather than simple port numbers.
The problems engendered by dynamic IP and NAT already have a technical solution: IPv6. This new version of IP, the next generation Internet protocol architecture, has a 128-bit address space--enough for every host on the Internet to have a permanent address. Eliminating address scarcity means that every host has a home and, in theory, can be reached. The main thing holding up the deployment of IPv6 is the complexity of the changeover. At this stage, it remains to be seen when or even if IPv6 will be commonly deployed, but without it peer-to-peer applications will continue to need to build alternate address spaces to work around the limitations set by NAT and dynamic IP.
Peer-to-peer applications stress the bandwidth usage of the current Internet. First, they break the assumption of asymmetry upon which today's ADSL and cable modem providers rely. There is no simple way that peer-to-peer applications can work around this problem; we simply must encourage broadband connections to catch up.
However, peer-to-peer applications can do several things to use the existing bandwidth more efficiently. First, data caching is a natural optimization for any peer-to-peer application that is transmitting bulk data; it would be a significant advance to make sure that a program does not have to retransmit or resend data to another host. Caching is a well understood technology: distributed caches like Squid have worked out many of the consistency and load sharing issues that peer-to-peer applications face.
Second, a peer-to-peer application must have effective means for allowing users to control the bandwidth the application uses. If I run a Gnutella node at home, I want to specify that it can use only 50% of my bandwidth. Current operating systems and programming libraries do not provide good tools for this kind of limitation, but as peer-to-peer applications start demanding more network resources from hosts, users will need tools to control that resource usage.
Social solutions: Engineer polite behavior
Technical measures can help create better peer-to-peer applications, but good system design can also yield social stability. A key challenge in creating peer-to-peer systems is to have a mechanism of accountability and the enforcement of community standards. Usenet breaks down because it is impossible to hold people accountable for their actions. If a system has a way to identify individuals (even pseudonymously, to preserve privacy), that system can be made more secure against antisocial behavior. Reputation tracking mechanisms, discussed in Chapter 16, and in Chapter 17, Reputation, are valuable tools here as well, to give the user community a collective memory about the behavior of individuals.
Peer-to-peer systems also present the challenge of integrating local administrative control with global system correctness. Usenet was successful at this goal. The local news administrator sets policy for his or her own site, allowing the application to be customized to each user group's needs. The shared communication channel of news.admin allows a community governance procedure for the entire Usenet community. These mechanisms of local and global control were built into Usenet from the beginning, setting the rules of correct behavior. New breed peer-to-peer applications should follow this lead, building in their own social expectations.
The Internet started out as a fully symmetric, peer-to-peer network of cooperating users. As the Net has grown to accommodate the millions of people flocking online, technologies have been put in place that have split the Net up into a system with relatively few servers and many clients. At the same time, some of the basic expectations of cooperation are showing the risk of breaking down, threatening the structure of the Net.
These phenomena pose challenges and obstacles to peer-to-peer applications: both the network and the applications have to be designed together to work in tandem. Application authors must design robust applications that can function in the complex Internet environment, and network designers must build in capabilities to handle new peer-to-peer applications. Fortunately, many of these issues are familiar from the experience of the early Internet; the lessons learned there can be brought forward to design tomorrow's systems.
1. The authors wish to thank Debbie Pfeifer for invaluable help in editing this chapter.
Back to: Sample Chapter Index
Back to: Peer-to-Peer
© 2001, O'Reilly & Associates, Inc.