The purpose of open source and free software licensing is to permit and encourage the involvement by licensees in improvement, modification, and distribution of the licensed work. This open development model of software development is the unique strength of the open source and free software movement. While the open source and free software licenses already discussed approach open software development differently, open development is the goal.
This chapter describes the basic principles of software development under open source and free software licenses, including the problems of forking, community development under the bazaar and the cathedral models, how open source and free software projects are initiated and maintained, and the effect that license choices can have on software development. This chapter also briefly discusses the basic principles of drafting contracts, for those who are interested in drafting their own software license.
The open source and free software licensing is driven by the development model, or models, that it is intended to encourage. After all, there is little point to permitting the "free" modification and distribution of a work if people do not actually take the opportunity to modify and distribute the licensed work.
These licenses are intended to permit, and indeed, to encourage the contributions of others to the project. Nonetheless, one of the first open development projects relied, at least at the beginning, on a relatively small number of closely-knit developers. This project was Richard Stallman's plan to develop a complete operating system modeled after the Unix operating system but written entirely in free code.[1]
This project created numerous, deeply influential programs, including the widely used Emacs and the GNU C Compiler and, with the arrival of the Linux kernel developed by Linus Torvalds and his associates, resulted in the creation of the first entirely free operating system, the GNU/Linux operating system. Stallman is also the author of the GPL, and the first, and still most important, philosopher of the free software movement.
Nonetheless, the initial projects under the aegis of the Free Software Foundation—the group Stallman founded to serve as the homebase for the nascent free software movement—did not rely on the open development model, to the same extent, for example, as the Linux project did. Part of the explanation for this is purely a matter of circumstance. The great engine of free software development is the Internet. When Stallman had his epiphany as to the importance of keeping software free in the early 1980s, the Internet was still in its early adolescence. While universities and colleges (particularly those associated with the Department of Defense) and scientific institutions had access to it, relatively few individuals did.
Stallman originally announced his intention to create a complete Unix-compatible software system in the fall of 1983. At that time, he had already written the widely popular Emacs editor, and he started to develop a completely free operating system. The frustration that Stallman felt with the increasing strictures placed on free computing and in particular with the application of security protocols, passwords, and "blackbox" binary code that drove him to this project has been well-described elsewhere.[2] After he formally resigned from the Massachusetts Institute of Technology's Artificial Intelligence lab, Stallman dedicated himself to creating various components that would become critical parts of the GNU/Linux operating system: the GNU C Compiler, GNU Emacs, the GNU Debugger, the GNU C Library, and perhaps no less importantly, the GNU Public License.
It is no exaggeration that it was Stallman's original intention, and his practice for a considerable period, to undertake the bulk of the work substantially by himself. An episode from around the time of the beginning of the GNU project demonstrated that this was possible. By 1982, a company named Symbolics had hired away more than a dozen programmers from the MIT AI Lab to develop a commercial version of the Lisp operating system—an operating system developed and maintained by the MIT AI Lab— against a competing company, Lisp Machines, Inc., or LMI, which had also hired numerous MIT hackers. Under its agreement with MIT, Symbolics was contractually required to permit Stallman, as MIT's administrator of the Lisp system, to review the source code but not required to permit MIT to adopt any of that code. Nonetheless, Symbolics, as a matter of custom, permitted Stallman to adopt features from its source code and maintain them in MIT's version of Lisp. Stallman kept MIT's version of Lisp free, and LMI looked to it to see what developments and improvements its competitor, Symbolics, had made.
In early 1982, Symbolics decided to hold MIT to the terms of the agreement and barred Stallman from incorporating changes from its version of Lisp. Stallman viewed this as a declaration of war. In what is still considered one of the major feats in programming history, Stallman spent much of the next two years matching the new features and additions in Symbolics' Lisp on his own, keeping pace with a much larger team of programmers, feature for feature.
In the period from early 1984 to 1990, Stallman was generating useful and influential programs at a phenomenal rate. In addition to the GNU Emacs, the GNU Debugger, and the GNU C Compiler already mentioned, Stallman developed GNU versions of several Unix programs, including the Bourne shell, YACC, and awk programs. However, in developing these programs, Stallman relied heavily on his own immense facility as a programmer and a relatively small number of collaborators. While the GPL was designed to ensure maximum freedom to users and programmers for programs developed under the license, Stallman himself, as a project manager, maintained relatively tight supervision over each of the GNU projects.
This led, perhaps inevitably, to the first major stumbling block of the GNU project. Stallman, quite deliberately, had organized his operating system around a piecemeal approach in which the tools for the system would be written before the kernel, its central component. By 1990 or so, that kernel was the last major piece not to have been completed. Stallman and the GNU project had been working on a kernel since at least 1987, starting first with a kernel based on Trix, an MIT program. By 1993, however, the GNU project, having abandoned Trix, had gotten bogged down in a micro-kernel called Hurd.
There were a number of issues that slowed the development of Hurd, including the focus by a more mature Free Software Foundation on the theoretical aspects of micro-kernel development; a breakdown in communication between the GNU Debugger group and the group in charge of developing the kernel; "look and feel" lawsuits that had been brought by Apple and Lotus against other operating systems (most notably Microsoft); and perhaps not least, limitations on Stallman's own contibutions, caused by a disability that prevented him from typing.[3] This temporary setback set the stage for another great open development project, one using a very different development model.
Just two years earlier, in 1991, Linus Torvalds had started work on his own operating system kernel. Originally based on the Minix operating system, itself an "open" operating system designed for teaching purposes, in a famous email on August 25, 1991, posted to the Minix usegroup, Torvalds announced that he was working on a "(free) operating system (just a hobby, won't be big and professional like gnu) for 386 (486) AT clones."[4] By September, Torvalds had released the first version of Linux, Version 0.1. Interest in Torvalds' operating system, at least within the relatively small Minix community, was immediate and intense. Other programmers quickly responded to Torvalds' postings with questions, comments, and suggestions for how to improve the nascent operating system.
These postings set into motion what would quickly become the Linux phenomenon. This process involved, and indeed depended on the contributions of at first dozens, then hundreds, and now thousands of users, debuggers, and programmers. This development model is likely Torvalds' most significant contribution to open source and free software programming—notwithstanding his own considerable organizational and programming abilities. As the project grew in size and complexity, a structure developed organically, with other noteworthy programmers—such as Alan Cox, Dave Miller, and Ted Ts'o—taking on significant roles in managing the burgeoning growth of these projects. These three, and others, act as intermediaries between Torvalds, who remains at the center of the project.
As Eric Raymond put it in his essay "The Cathedral and The Bazaar," "Linus's cleverest hack was not the construction of the Linux kernel itself, but rather his invention of the Linux development model."[5] As described by Raymond, this development model is dependent on a number of interlocking conditions. The first is the importance of users. Every program needs a constituency of users who use the program, want the program to work, and are sufficiently committed to make at least some effort toward improving it, whether it be by contributing bug reports or patches. The consistent involvement of such users makes the discovery and elimination of bugs easier. The second is the maxim of "release early, release often." By releasing early and quickly incorporating changes from users, project developers keep their user base actively engaged and involved. When a user notices a bug, submits a patch, and then a few weeks (or even days) later sees the improvement he suggested worked into a new release, he sees immediately the benefits of the development model. He has been rewarded, not financially, but by the availability of a better program. This reward, of course, is shared within the entire community of developers. The "release early, release often" strategy also cuts down on the possible duplication of effort by a number of users/programmers working, unknown to each other, to identify and fix the same bug. When a problem is quickly identified and its solution is incorporated into a new release, the number of users (and hence potential debuggers) exposed to that solved problem is reduced.
This debugging strategy takes advantage of the many different perspectives, and different uses, put to the program by a spectrum of users. While a bug may seem difficult to isolate from the perspective of a single programmer, that same bug may, upon exposure to a hundred different users and programmers, seem immediately obvious to just one of them. As long as that one is sufficiently committed to submit a detailed bug report or a patch, the project has progressed, and probably more quickly and easily than a more tightly focused, but smaller, group of programmers would have reacted.
This debugging perspective does not necessarily address the complex problems of organizing group work on developing source code in the first instance. In such cases, depending on the development model, adding more programmers to a project may not quicken development, but in fact may slow it down as the additional costs associated with communicating information among a larger group of people outweigh the incremental benefit of adding programmers to a project. While the Linux development model has kept direction and focus within a relatively small circle, as may well be necessary for a software project of any size to survive, much less one of the size and complexity of Linux, its openness has been its strength. By encouraging "egoless" contributions that are improvements to an already established workflow, as opposed to redirections of that workflow, the Linux development model avoids much of the drag that can result from the difficulties in social and information engineering in large, traditional, software projects.
This bazaar model contrasts with what Raymond describes as the cathedral model of software development. Software development, in its traditional form, relies on tightly focused, relatively small groups of programmers associated with a single institution or corporation. Such groups sometimes are as small as two or even just one programmer. Unix itself was the creation of legendary hacker Ken Thompson at Bell Labs: it was written in the programming language C, itself written by another hacker, Dennis Ritchie. Both Unix and C were designed to be simple (or at least simpler than their contemporary competitors). This simplicity and their immense popularity made them prototypes for Linux and the GNU programs that came after them.
Their simplicity and portability made them popular among programmers. Despite an almost total lack of interest by AT&T (Bell Labs' corporate parent), Unix and C spread quickly, first inside AT&T and then outside it. By 1980, it was commonplace in universities and research institutions. Unix, the model for the GNU project and Torvalds' Linux project, set the stage for open source development.
Nonetheless, Unix itself never became a truly open development.[6] Although there were a number of "hot-spot" programming communities—including Berkeley, the AI and LCS labs at MIT, and Bell Labs itself—these communities were largely self-contained, and although relatively large in the number of programmers they had, did not have the mass to support an open development project, even if there was one. The absence of such a project was in part due to the legally imposed limitations by trade secrets and copyrights, and movement toward commercialization of software in the late 1970s and early 1980s. The same trends that led to Stallman's Symbolics war and his subsequent exit from the MIT AI Lab were closing doors to open development projects. Software, once given away for free with expensive hardware, was becoming a booming business in itself.
In its traditional form, commercial software development is based on the exploitation of the monopoly created by copyright for competitive advantage. It makes sense in that system to avoid any process that would undermine that advantage, such as, for example, the sharing of source code with thousands of potentially competing strangers. Programmers for commercial concerns do "work-for-hire": the code they write does not belong to them but to their employers. They are routinely required to sign non-disclosure agreements, preventing them from disclosing to anyone else information that is proprietary (i.e., what their employer considers to be proprietary). Such programmers are also frequently asked to sign non-compete agreements, which prevent them from working for their employer's competitors for a year or two (or more) after they leave that employer. In this environment of deliberate concealment of any information that could be of use to the competition, the idea of open source is anathema.
This emphasis on secrecy channeled commercial programmers into cathedral-style models of software development. While such companies are free to hire as many programmers as they may need, even the resources of a company such as Microsoft are limited.[7] No user base (or almost no user base) would be willing to subject itself to the disclosure restrictions that are required to maintain the commercial advantage software companies want.[8] Without open source code and knowledgeable (and energized) users, bug reports, to the extent they are submitted, greatly diminish in value to the project. What results is a relatively small group of programmers, as talented as the resources and attractiveness of the company can gather, building the software project essentially in secret and presenting it as a black box to the software-buying public.
This model of software development is not limited to commercial development. The GNU project, while certainly not anywhere near as "closed" as traditional commercial software development, relied heavily on the contributions of a relatively small number of people who were relatively tightly organized. The GNU project did not, at least in its early days, follow a "release early, release often" model. Its ability (or desire) to incorporate bug reports and patches submitted by users outside the project was limited accordingly. This should not be read as a slight to the GNU project. GNU Emacs has incorporated the suggestions of hundreds of participants over more than 15 years of development and stands as a highly respected model of free software development. In addition, the GPL built a foundation for the open development model.
What really accelerated the full bloom of the Linux development model, however, and the astonishingly rapid development of Linux itself, was what Raymond calls "cheap Internet." While the predecessor of the Internet, ARPANet had been available at most research universities and institutions since the 1970s, the available bandwidth was small and access was limited. The cascading expansion of the Internet from 1990 or so on allowed a whole new realm of users to access it for email, Usenet groups, and surfing the newly developed World Wide Web.
The availability of software archives accessible by the Internet, Usenet groups open to contributors, and most importantly, email to permit communication between project originators, contributors, and users, were all necessary for the success of the Linux development model on the scale that Linux itself has achieved. The legal infrastructure of open source combined with the technical infrastructure of the Internet to make this new approach possible.
The Linux development model is obviously not the only one for developing software. It depends on the commitment and knowledge of its user base to succeed. Such users simply may not be available for every type of program. End user applications (such as video games) have been slow to develop under open source or free software development models.
Nonetheless, the Linux development model is useful (and powerful in its applications) for much more than just Linux itself. The same Linux-style development has been used successfully for a large number of programs.
While the choice of a particular license is an important factor, it is far from the only factor in determining the development of any given project. Both Linux and the GNU project's many developments were created under the same license, the GPL.[9] Nonetheless, as just described, they follow very different patterns of development. The circumstances surrounding the development of a project, and, in particular, the personalities of those involved and the technology available to its originators, developers, and users, can have far more to do with the success of a project than the choice of a particular license.
The open development model may even keep code "open" that the governing license would permit to be closed, by incorporating it into a proprietary license. For example, as described in Chapter 2, the Apache License permits distribution of modified versions under proprietary licenses. In June of 1998, IBM announced that it would ship Apache as part of its WebSphere group of programs and provide continuing enterprise level support for it.[10] As a natural consequence of this adoption, IBM developed its own modifications to the Apache software and distributed them under a license that it had written for this purpose, the IBM Public License. The original Apache license permitted IBM to license its modifications under a proprietary license and not to disclose their source code, and the IBM Public License did nothing to limit its ability to do so. Nonetheless, IBM continued to publish its source code and to freely permit the adoption or modification of its own work. The reason for this was simple. If IBM kept its code proprietary, eventually its version of Apache would depart from the standard Apache version. Future modifications to the standard version would become more difficult to port to IBM's version. IBM would lose the benefits of the open development process for its own version of Apache, as users and potential contributors would have less incentive to contribute bug reports or patches to it—particularly when a strong competitor, such as standard Apache, existed in the same marketplace.
In short, if IBM wanted to remain a contributor to the process (as well as a beneficiary in the fullest sense), it had to contribute, or at least not to keep whatever contributions it had already made to itself. Regardless of the terms of either of the applicable licenses, IBM's or Apache's, to get the full benefits of open source development, IBM had to live by open development rules.
[1] The following discussion draws heavily from the essay of Eric Raymond, "The Cathedral and the Bazaar," in The Cathedral & The Bazaar: Musing on Linux and Open Source by an Accidental Revolutionary, Eric S. Raymond (O'Reilly, 2001).
[2] The circumstances surrounding Stallman's decision to begin work on the GNU project are described in Free As In Freedom: Richard Stallman's Crusade for Free Software, Sam Williams (O'Reilly, 2002).
[3] For a more detailed discussion of the Hurd micro-kernel and the difficulties in its development, see Free As In Freedom: Richard Stallman's Crusade for Free Software, Sam Williams (O'Reilly, 2002) at pages 146 and following.
[4] Torvalds' email as reprinted in rebel code: inside linux and the open source revolution, Glyn Moody (Perseus Publishing, 2001) at page 42.
[5] The Cathedral & The Bazaar: Musing on Linux and Open Source by an Accidental Revolutionary, Eric S. Raymond (O'Reilly revised ed. 2001) at page 29.
[6] It is an irony worth noting that the current holder of the rights to Unix, the SCO Group, has sponsored numerous continuing lawsuits against users of GNU/Linux distributions under the theory that some, as of this writing unspecified, portion of these distributions contains Unix code under the copyright held by SCO Group.
[7] Microsoft's Shared Source Initiative, briefly described in Chapter 5, is driven in large part by its attempt to engage with this problem, that is to say, to involve as large a group of developers and users in its process without surrendering its legal rights under copyright law.
[8] The Sun Community Source License, described in Chapter 5, with its restrictions on distributions outside the community of developers, is a step in that direction.
[9] The very first releases of Linux were released under an open source license of Torvalds' own devising. Torvalds, however, adopted the GPL early on and it has covered every subsequent distribution of Linux.
[10] The circumstances surrounding IBM's decision to support Apache are described in rebel code: inside linux and the open source revolution, by Glyn Moody, (Perseus Publishing, 2001) at page 205 and following.
Get Understanding Open Source and Free Software Licensing now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.