Chapter 1
By Dale Dougherty
All The Pieces of PIEPeer-to-peer networking (P2P) neither begins nor ends with Napster. Sean Fanning did not want Napster to distribute MP3 files from a central Web site. He wanted to put users in control of the content so that they shared files directly with one another. Napster's decentralized file-sharing service made each user's machine a peer in a network of peers, requiring users who download music to also make it available to others.
Napster succeeded in shifting the locus of control from the center to the edge. It also leveraged the resources available at the edge, making each user's PC a server. While Napster's technical success did not help it avoid legal troubles, Napster created a new mindset for those who are building network applications.
We call this "mindset" P2P. In its purest sense, P2P signifies a decentralized architecture. However, even Napster is not "pure" P2P; it uses a centralized server to match people who have music with people who want it. Few applications (or services) can be totally decentralized. What matters is choosing the appropriate balance between centralization and decentralization.
Of course, Napster did not discover peer-to-peer networking, but the benefits of decentralization have tended to go unrecognized. Decentralization was an architectural principle of the Internet itself, but over a period of 10 years, the Internet has changed significantly. Some even think that because the Net (and the Web) are becoming increasingly centralized that they are fundamentally broken.
Napster had to overcome obstacles that are characteristic of today's Internet such as firewalls and the inability to use the Domain Name System (DNS) reliably to locate users' machines. Napster's leveraging of underutilized resources allowed it to become the most scalable software application ever built.
The P2P mindset is continuing to drive innovation, which we will explore in this report. P2P innovation can be understood in terms of new applications, services and infrastructure. The functionality that we find in P2P applications is going to become part of the network infrastructure, as services shared by all kinds of applications. P2P is reshaping the Internet, creating new opportunities to develop and deploy robust, networked applications.
Defining P2P
Clay Shirky has offered a rather solid definition of P2P:
"P2P is a class of applications that takes advantage of resources--storage, cycles, content, human presence--available at the edges of the Internet."[1]
Using Shirky's definition, almost any kind of resource can be a peer. P2P applications make use of techniques for sharing distributed resources.
The most mature application areas for P2P are file sharing and instant messaging in the consumer space. Distributed computation, enterprise file sharing and groupware are gaining a foothold in the enterprise.
Here are some examples of P2P technologies that we'll look at in more detail in this report:
- P2P file-sharing applications provide alternatives to existing mass storage solutions by organizing disk resources at the edges of the Internet. There are obvious cost savings from eliminating file-server farms and their associated local storage arrays. But file sharing has potential benefits in bandwidth sharing (e.g. distributed content streaming), load balancing, fail-over redundancy, collaborative content creation and maintenance, and more.
- Distributed-computation systems, as exemplified by the SETI@Home project, organize processing resources that are present but underutilized. SETI@Home distributed the computational workload among millions of volunteers and their computers. (At the time of writing, SETI@Home has 3,099,544 users contributing 637309.416 CPU years--offering nothing more than a colorful screen-saver and public recognition.)
- Instant-messaging applications such as AIM represent something of a killer app for P2P. Tens of millions of users are using IM programs in ways that are radically shifting business and personal communication and collaboration. IM is inherently a PIE application, since it is built on presence and identity, with the chat content itself representing edge content.
- Groove Networks provides a shared, secure space for peer-to-peer interactions. It is also offers a platform for new kinds of collaborative applications that leverage peer networking. By moving the collaborative space to its users' PCs, Groove removes the dependency on IT for control and administration. Configuration is automatic, adaptive and flexible. We'll look at Groove in Section 5, P2P Groupware.
There are also several, perhaps unexpected, properties of P2P networking that may not be evident from any definition or any of the examples above.
Device independent. P2P is not at all limited to communications among desktop computers, although most P2P focus on fully utilizing the power of PCs. The computing world is demanding that all kinds of devices communicate easily with one another. P2P networks have demonstrated that you can organize resources across heterogeneous devices. Web services will take this further.
Real time. P2P applications have proven to be interesting experiments in exploiting new capabilities of real-time communications. Instant messaging and distributed computation can both be considered real-time communications platforms.
Parity. In a peer-to-peer network, the peers are considered equals, even though they may not share the same capabilities.
P2P Equals PIE
We use the term PIE to describe the core elements of P2P applications that we see today and which will become generalized in peer-computing platforms of the future. PIE stands for Presence, Identity and Edge Resources. We'll unravel the PIE acronym in reverse, starting with edge resources.
Edge Resources
Peer-to-peer networks make use of resources at the edge of the Internet. Peering organizes a variable-sized pool of distributed resources.
Today's typical PC is roughly as powerful as a 1987 supercomputer, yet we use them mostly for word processing and e-mail, and they're idle for large parts of the day. The resources available in a company with 100 PCs, or among the 35 million Internet users with broadband access today, collectively comprise a stunning amount of computational horsepower. P2P can organize the computing resources available on machines in a network, even isolating resources such as CPU power, disk storage and connectivity. This is certainly what distributed-computation and file-sharing applications do.
P2P systems, in addition to simply enlarging the available pool of resources and ser-vices, can make more effective use of existing resources. Enterprise file sharing and collaborative software may be able to improve the way organizations share important documents, and organize the interactions of those creating and editing those documents. There are several P2P companies that specialize in deciding what documents to share automatically, immediately and dynamically in response to network traffic patterns.
Users are an important network resource. By establishing peering relationships among users, new types of applications are possible.
The big problem with Edge resources is that they are transient. Sometimes they are available and sometimes not. Reliability, bandwidth and location are subject to change. So, P2P applications must be able to accommodate these properties of edge resources.
Identity
Peer-to-peer networks must be able to identify uniquely the resources that are available.
While we can use DNS to identify machines on the network, there aren't enough IP addresses to go around. The deployment of Dynamic DNS and NAT are workarounds for a structural problem with the Internet. (Presumably, it's a situation that will be alleviated with IPv6, but that may be a long time in coming.) P2P systems have had to come up with their own naming schemes that are not dependent on IP addresses. Thus, file-sharing applications manage their own namespaces, allowing users to have persistent identities on the system, regardless of IP address.
In a way, identity is nothing new--we all have our online identities via e-mail. And indeed e-mail is often mentioned as a prototype of a P2P network--a network where all the peers happen to be servers. But e-mail doesn't respect the concept of Edge, and the Edge lives outside of traditional DNS. More importantly, e-mail is massively asynchronous. P2P is about real-time communications, whether that's chatting in IM, serving files in Gnutella or collaborating on a document.
Presence
The final piece of PIE is presence, the ability to tell when a resource is online.
Instant messaging is the most mature P2P app in terms of taking advantage of presence. A buddy list shows which of your friends is online. But IM is just the tip of the iceberg.
Presence allows for any number of services to be offered to users, once their presence online is established. One example is notification. An e-commerce application might leverage a presence management system such as instant messaging to initiate a customer contact about an online order in real time.
Knowing when someone is online offers huge potential for building distributed, user-centric, systems.
The Whole PIE
We believe that P2P technology is in transition from one-off applications such as Napster to an entire set of peer-networking services that transform P2P into a platform.
The first generation of P2P applications set up closed peer networks, accessible only to them. The next generation is looking to increase interoperability among peers by defining standard XML-based protocols. Gradually, P2P technologies will become more integrated into the network infrastructure so that more applications can utilize peer-computing services.
P2P's evolution should follow the same path as e-mail systems did. The first e-mail systems were closed systems or silos. Gradually, gateways were set up to exchange messages among different systems before a fully standardized infrastructure emerged. Gateways were complex and cumbersome, however. Finally, when the Internet emerged, its open protocols and formats gained a dominant advantage, allowing companies to dispense with gateways.
One sign of interest in P2P interoperability and infrastructure is the Peer-to-Peer Working Group (http://www.ptpwg.org). Its charter is: "To facilitate and accelerate the advancement of infrastructure best-known practices for peer-to-peer computing." The consortium consists currently of 30+ member entities. They have, thus far, not come up with much in the way of a tangible generalized P2P framework.
P2P From the Ground Up
Many P2P applications are silos, and they do not typically interact with other P2P applications. Napster is only for sharing MP3s. Its files and users are contained within Napster and not available to other applications. A Napster user could not send an instant message to a fellow user without, of course, launching an IM client to do so and having outside knowledge of that person. Peers that participate in one distributed computation project such as SETI@Home cannot use the same software to participate in United Devices' project to sequence cancer gene data.
Now this state of affairs is, of course, understandable. Many P2P developers admit that their systems contain many clever hacks invented along the way to solve a specific problem.
Interoperability
Getting P2P applications to interoperate with each other is possible, either by implementing common protocols or using standardized messaging formats based on XML.
Early attempts at interoperability include Freenet, Jabber and Groove Networks. Freenet's open API and XML-RPC interface allow application developers to speak with the network via a Freenet node without having to implement the full suite of complex communication and encryption protocols. Jabber went so far as to create a compelling real-time XML routing architecture, pushing structured data through an instant-messaging stream.
Of course, leading vendors do not always want interoperability if it erodes an already established advantage. AOL, for instance, is engaged in a game of move/counter-move as it attempts to block messages sent to its AIM user base by alternative instant-messaging clients such as Jabber and Aimster.
While P2P applications may be closer to talking with one another, they are still separate applications with their own baggage, consuming their own share of resources.
P2P as Infrastructure
Once the core services that make up a P2P application become part of the computing or network infrastructure, there will be a common substrate or plumbing that can be shared by P2P applications, making them easier to dev-elop and deploy. While a number of P2P companies claim to be providing infrastructure, it seems clear that Microsoft's .Net Framework and Sun Microsystems' JXTA have the best opportunity to define the core elements of a P2P platform. Groove Networks also has provided a comprehensive decentralized groupware platform, incorporating such P2P services as instant messaging and file sharing.
What is needed are components of a network operating system; what seem to be offered in most cases are extensible, yet specific P2P applications. This is, no doubt, due to the relative immaturity of the technology, and the lack of agreement on the elements of a P2P platform.
That said, there are good reasons to hope that a P2P infrastructure develops soon. Developers find themselves building applications from scratch; they have to invent, construct, test and deploy their own home-brewed P2P Internet-working designs before even getting to the application at hand. Redundant or conflicting systems for managing peer resources also make it hard for P2P to grow beyond its file-sharing origins. With a platform, new and innovative services can be offered--by the system owner, by third parties, by end-users themselves.
P2P as a Market
It doesn't make sense to talk about P2P as a market or industry, any more than it did when the Web emerged in the mid-1990s. With the notable exception of Napster, most of the P2P activity in 2001 is at the leading edge, as innovation follows a technology vision rather than a specific business opportunity.
There are approximately 150 companies involved in developing P2P technology or services. In the past year, we have seen some companies redefine themselves as P2P just as we have seen others distance themselves from the label. This seems to track the rise and fall of P2P hype.
In a survey of 100 of these companies, we found that almost one-third identified themselves as infrastructure companies. That backs our belief that P2P is going to become a platform. However, few of these companies are in a position to establish standardized P2P infrastructure unless they are acquired by a large company.
Approximately $560 million has been invested in P2P companies so far, 84 percent of that in the past 18 months. VC interest has focused on P2P companies targeting enterprise sales. No P2P company has yet derived significant revenue from consumer sales, and we do not anticipate that happening soon with any of the products currently available.
However, it would be a mistake to think that the innovation brought about by P2P will be limited to enterprise-software sales. Even if those companies will be the only ones making money in 2002 and 2003, their revenue streams will be directly impacted by popular technologies with no revenue model whatsoever.
Just as P2P technology is about computing on the fringes of the Internet, development often takes place on the fringes of the workforce. Many innovative P2P technologies were conceived of and implemented by individual developers, not large corporations. Many of them are open-source projects. While lone wolf pro-grammers may be unlikely to generate a significant revenue stream by themselves, their mere existence can shrink the revenue streams of the larger companies that venture capitalists are funding today. BearShare, created by Vinnie Falco in his spare time, is one of the leading Gnutella clients (along with LimeWire, BearShare is one of the top 10 most downloaded programs as tracked by c|net's Download.com). BearShare is free, and may never result in a big paycheck for Falco, but it will have a dra-matic impact on the dozen or so companies trying to build revenue around other Gnutella clients. It is not clear whether any of these open-source projects can successfully establish a common infrastructure for P2P applications.
The best evidence that P2P is significant is that the major vendors are staking out this ground. Microsoft, IBM, Intel and Sun all are involved in developing or supporting P2P efforts.
Even the U.S. government has jumped in, using NextPage's NXT3 platform. From an April 20 article in InfoWorld:
Instead of buying additional servers and hiring extra IT staff to set up and manage the new system, the government looked to p-to-p technology as a foundation on which it could build a solid file-sharing system that makes use of computing equipment already in place.Finally, the current business climate dictates that there are very few opportunities worth funding. Consolidation is happening and there will be fewer players. Roku and Popular Power are two examples of companies that had interesting technology but could not get follow-on funding.
Despite the lack of funding, the experimental side of P2P technology will continue to evolve. More often than not, the technology leads and the business opportunities follow. That is what we believe will happen in the long-term with P2P.
P2P and Web Services
P2P and Web services have much in common. Both lay claim to defining the next generation Internet, and both may be right. P2P is the more general of the two movements, an attempt to weave the world's intermittently connected machines--the "dark matter"--directly into the fabric of the Internet. Because it is such a general movement, and because there has been little progress so far in adopting general standards, there is no guarantee that any two P2P applications will have anything in common.
Web Services, on the other hand, is a narrower technological challenge, an attempt to take the Web model for publishing--loosely coupled components with a simple "request-and-response" model--and apply it to computing using XML messages instead of HTML documents. Because of its insistence on well-defined standards for communication between client and server, standards such as XML-RPC, SOAP, WSDL and UDDI[2], different Web Services are designed to work together--indeed, unlike P2P, interoperability is a fundamental goal.
Furthermore, the players in the two spaces are quite different. P2P is a classic startup driven business, where companies such as Napster, OpenCOLA and Groove Networks are driving innovation. Web Services, on the other hand, is being driven by big companies attempting to extend the Web--IBM, Sun, Microsoft--with the smaller companies in this space--Userland, DevelopMentor, idoox--usually working with the big firms and one another, rather than in isolation.
Because of their different approaches, and the different mix of industry participants and standards efforts, we believe that P2P and Web Services are sufficiently distinct to merit separate treatment, so we decided to make Web Services the subject of its own report.
As the two movements develop, however, there are also increasing areas of overlap. With Microsoft's HailStorm announcement, for example, it is clear that Microsoft plans to leverage its instant-messaging system as a platform for Web Services. Passport will provide identity services and Microsoft Messenger will offer presence management for networked ap-pli-ca-tions. This introduces both the P and I of PIE, as well as creating Web Services that do not use http as its transport mechanism.
Likewise, many P2P companies are moving toward a standards-oriented architecture similar to Web Services. 3Path, the groupware company that uses SMTP as a transport mechanism, has rebuilt its 2.0 platform around ICE, the Internet Content Exchange XML format used by Vignette and other content publishing systems.
One key area where P2P and Web Services seem to be converging is XML messaging. A continuous thread of conversation in the P2P world has focused on the need for standards. Web Services will help establish the protocols and formats that make it possible for P2P applications to interoperate more easily. Likewise, despite the preservation of the client-server distinction in the Web Services world--as opposed to P2P's ser-vers, nodes and transceivers--there is no reason edge-connected devices can't offer P2P interfaces out into the cloud.
Because P2P is a mindset, while Web Services is a technology (or rather a related set of technologies), it seems likely that in the places where the two merge, Web Services will become the dominant label. However, Web Services will borrow heavily from the lessons P2P has to teach about diverse transport protocols, about real-time communication relying on presence and identity and on the ability of edge-connected devices to produce as well as consume resources. P2P and Web Services will be come increasingly symbiotic, a symbiosis that will affect Web Services more than it will P2P. Our research report on Web Services will cover that symbiosis, and the ways in which increasing leverage of PIE will alter and improve the Web Services landscape.
- Clay Shirky, "What's P2P and What's Not." http://www.openp2p.com/pub/a/p2p/2000/11/24/shirky1-whatisp2p.html. 11/24/2000
- XML Remote Procedure Call (XML-RPC) and Simple Object Access Protocol (SOAP) are two methods for invoking some sort of computation or calculation on a remote machine; Web Services Description Language (WSDL) is a way of defining a web services interface in XML; Universal Description, Discovery and Integration (UDDI) is an XML-based service designed to make it possible to publish new web services to a searchable directory.
Back to: 2001 P2P Networking Overview
© 2001, O'Reilly & Associates, Inc.
webmaster@oreilly.com