BUY THIS BOOK
Add to Cart

Print Book $29.95

Add to Cart

Print+Electronic $38.94

Add to Cart

Electronic $23.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £20.95

What is this?

Looking to Reprint or License this content?


Open Sources 2.0
Open Sources 2.0 The Continuing Evolution By Chris DiBona, Mark Stone, Danese Cooper
October 2005
Pages: 488

Cover | Table of Contents


Table of Contents

Chapter 1: The Mozilla Project: Past and Future
Mitchell Baker
The Mozilla project was launched on March 31, 1998. On this date, the source code for the Netscape Communicator product was made publicly available under an open source license, the "Mozilla Organization" was founded to guide the project, and development of the codebase began to move from a proprietary model into an open model coupled with commercial involvement and management practices.
Of these three elements, the release of the source code is discussed in Open Sources. In summary, the source code was prepared for public release by removing all code that Netscape didn't have the right to license under an open source license, and then replacing those pieces necessary for the code to compile and run. At the same time, a new open source license—the Mozilla Public License—was written, reviewed, and accepted by the open source community, including the Open Source Initiative (http://www.opensource.org). The other two topics—the story of mozilla.org and the development of the Mozilla project—are the subject of this essay. The creation of the Mozilla Public License is generally an untold story, but it occurred during the time covered by the original Open Sources book and isn't discussed in detail here.
Each of these three activities was a step into the unknown. Basic development principals of the open source model ("running code speaks," peer review, leadership based on technical merit) were known. But the combination of open source techniques with an active, focused commercial management structure was uncharted territory. The shift of authority from a commercial management structure to a separate organization was new, and presented many management challenges. The development of project management techniques and tools that could be shared by multiple commercial development teams and a volunteer community was new. Development of a large, complex end-user application in the open source space was new.
Of course, the Mozilla project was not the first open source project with commercial involvement. Cygnus, many of the Linux distributors, and Sendmail were all companies involved with open source development, and the Apache project was developing experience in coordinating open source development where some of the contributors were paid by their employers. But none of these projects provided more than a rough set of guidelines for how the Mozilla project might operate. The Mozilla project was unusual, and at the time perhaps unique, in the way project leadership interacted closely with both commercial teams (project managers, people managers, and engineers) and individual contributors.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Founding of the Mozilla Organization: Obvious for Developers, a Bold Step for Management
The Mozilla project originally grew out of Netscape Communications Corporation and its Netscape Communicator product. In early 1998, the Netscape management team made the decision to continue development of Netscape's flagship product, Netscape Communicator, through an open source development model. At the time, Netscape Communicator and Microsoft's Internet Explorer browser were locked in a fierce competitive battle often referred to as the "browser wars." Netscape's goal was to seed a broad-based development effort within the software development community to produce future browser products as a shared resource.
At its inception, the Mozilla project faced some paradoxes. First, the only people familiar enough with the code to participate actively in its development were Netscape employees. Those employees were still expected to work within the management system and practices that Netscape had developed in its proprietary days. There was no volunteer community. And yet, even at that early time, it was clear that the long-term success of the project required a broad constituency of people and companies working jointly on the project. It was not enough to have open source code (code available under an open source license). The project needed an open development process, and this required authority over the code's development to be based on technical merit and distributed outside Netscape. The question was how to get there from here.
One thing was clear: the success of the project depended on it being a real open source project. In other words, the project needed to have technical legitimacy and development decisions would need to be guided by technical considerations. This was intuitively clear to the group of Netscape employees who were familiar with open source, eager to help move the Mozilla code into the open source world and who ultimately became the founding members of the Mozilla Organization. This group made the need clear to Netscape management, which was receptive to trying to do the right thing.
When the Mozilla project was officially launched, Netscape executive management therefore took some bold steps. First, they officially anointed "mozilla.org" as the steward of the codebase and leader of the project. I say officially because it's quite possible that a group like mozilla.org would have developed even if Netscape hadn't officially helped to create one. But this step was important, as it allowed mozilla.org to focus on building the project rather than on proving the necessity of its role.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Young Adulthood—the Mozilla Foundation
The idea of an independent legal organization to guide the Mozilla project had been discussed when the project was first launched in 1998. However, it was decided that the time was not quite right. At the time, there were no models for setting up such an organization and figuring out how it would be governed, who would participate, and so on. There was enough unknown and far too much work in getting the code ready, the project launched, and a browser developed to take on things we didn't absolutely have to do. Eventually we decided that the right time to create an independent Mozilla Foundation would be when a critical mass of people was interested in supporting a foundation. That critical mass would need to include a significant set of volunteers and a set of companies interested enough to fund browser developer and distribute Mozilla-based technology.
That critical mass began to develop with the release of Mozilla 1.0. Mozilla 1.0 showed that we could produce a good product, that the Mozilla releases where determined by Mozilla rather than by Netscape, and that the project had a positive future. At least one critical corporate participant came to us and told us that 1.0 proved our viability and that they were very interested in helping form and support an independent Mozilla Foundation.
Following the release of Mozilla 1.0, I spent a fair amount of time thinking about what an independent Mozilla Foundation would look like, how we might put it together, how many employees we would need, which companies would likely provide support, and how to finance employees in the early years. I had help from a set of mozilla.org staff members. In addition, I had the good fortune of hooking up with Mitch Kapor, who had recently joined the open source world with the launch of the Open Source Applications Foundation (http://www.osafoundation.org). Mitch was an immense help in thinking through various possible structures for the Mozilla Foundation and is an unsung hero in getting the Mozilla Foundation launched.
In the spring of 2003, the stars aligned. Mozilla.org staff was ready, the project had developed a critical mass, and we had some corporate support. In addtion, AOL decided it was ready to help spin out the Mozilla project. This was an important element for mozilla.org staff. Of course, we could have launched a project without AOL's support—that's the nature of open source—but the mozilla.org staff felt that AOL's support was important to the launch of an independent Mozilla project. We hoped that the use of the Mozilla trademarks would be transferred to a new organization, along with a set of machines. We wanted to be able to hire a group of people, some of whom were current AOL employees, without bad feelings. We felt it was very important to the project's stability to have a smooth transition from AOL to a successor. We also knew we needed to hire people to keep the project running well, and that it would take us time to find ongoing funding sources. So, the seed funding that AOL provided was another critical factor. Through July, I worked to reach agreement with AOL on how the Mozilla Foundation would be launched. Once again, Mitch Kapor provided invaluable assistance in helping to get the arrangements with AOL worked out.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Future
The mission of the Mozilla project is to promote choice and innovation on the Web by creating great end-user offerings. We focus on innovation because the Web is still young—we've seen only the beginnings of its potential. That potential can be stifled if we don't have innovative work done on the client side.
We focus on choice because this allows people to have greater control over their Internet experience. This control over our life on the Web increases in importance each year, as more and more critical functions such as banking, health care, insurance, and commerce are done over the Web. A monoculture is rarely a healthy ecology. A single effective choice in browsers and email clients is dangerous, both to consumers and to the health of the Web itself.
Firefox in particular has shown that consumers will pay attention to a product that provides an alternative, and that the Mozilla project can create such a product. We have a number of challenges ahead of us. We need to continue to release products that people love. We have a set of responsibilities that come with the user base, adoption rate, and increased visibility of the project. Conditions will change, and we will need to adapt. These are challenges, but certainly no greater than those we have faced to date. These are the challenges that result from the project's achievements. We have great talent, a powerful and creative community, a well-earned place in the Internet ecosystem, a growing user base, and, at long last, a legal home for the Mozilla project in the Mozilla Foundation.
As we go forward, there is no change in the mission of the project. Our basic approach of combining open source DNA with involvement by commercial entities will continue. The Mozilla Foundation has grown some and may grow some more, and we expect to continue working closely with a set of companies that are interested in developing and distributing Mozilla technology. The increasing acceptance of open source software by the commercial world opens up greater possibilities for collaboration. The emergence of web-based services provided through the browser also encourages business models for the service provider other than charging for each copy of software provided. This allows more entities to contribute to our project. Our focus on distributed development, technical excellence, and welcoming new participants will continue. The need for a vibrant, creative community of people focused on the Web will not change.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Open Source and Proprietary Software Development
Chris DiBona
In this chapter, I present a perspective on the similarities, differences, and interactions between open source and proprietary software development.
Before you go any further, throw off any notion that the proprietary developer is somehow a different person from the open source developer. It is uncommon for a member of the open source developer community to do only open source for a living. Only the most prominent, or loaded, members of the open source community come close to having this kind of freedom. It is indeed rare to find a developer who develops only with proprietary tools and libraries. Even Visual C++ and C# developers benefit from a great variety of code and libraries that are free for use in their programs.
My career has focused on open source development for the last 10 years, and I'm constantly pleasantly surprised by how open source development and proprietary resemble each other. I believe this is because proprietary developers are educated by the adventures of their slightly crazy open source cousins, but I also know that open source developers have learned just as much from proprietary developers.
Don't read this as an attempt to muddy the difference between proprietary and open source programs. They are different, sometimes very much so. However, they come from the same people, and they're using a lot of the same methods and tools. It is the licenses and the ideals behind open source programs that make them remarkable, different, and revolutionary.
A lot of people, when talking about open source software development, say that open source developers enjoy a great productivity gain from code reuse. This is true, but in my experience all developers, not just open source developers, benefit from the existence of free-of-charge standard libraries and code snippets. For decades, proprietary developers have had a great variety of prepackaged libraries to choose from, but these proprietary libraries haven't taken root in the same way that freely usable, open libraries have.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Proprietary Versus Open Source?
Before you go any further, throw off any notion that the proprietary developer is somehow a different person from the open source developer. It is uncommon for a member of the open source developer community to do only open source for a living. Only the most prominent, or loaded, members of the open source community come close to having this kind of freedom. It is indeed rare to find a developer who develops only with proprietary tools and libraries. Even Visual C++ and C# developers benefit from a great variety of code and libraries that are free for use in their programs.
My career has focused on open source development for the last 10 years, and I'm constantly pleasantly surprised by how open source development and proprietary resemble each other. I believe this is because proprietary developers are educated by the adventures of their slightly crazy open source cousins, but I also know that open source developers have learned just as much from proprietary developers.
Don't read this as an attempt to muddy the difference between proprietary and open source programs. They are different, sometimes very much so. However, they come from the same people, and they're using a lot of the same methods and tools. It is the licenses and the ideals behind open source programs that make them remarkable, different, and revolutionary.
A lot of people, when talking about open source software development, say that open source developers enjoy a great productivity gain from code reuse. This is true, but in my experience all developers, not just open source developers, benefit from the existence of free-of-charge standard libraries and code snippets. For decades, proprietary developers have had a great variety of prepackaged libraries to choose from, but these proprietary libraries haven't taken root in the same way that freely usable, open libraries have.

Section 2.1.1.1: Code reuse? Knowledge reuse!

In Linus Torvalds' essay from the first Open Sources, he talked about how the rise of open code was delivering on the promise of reuse touted by proponents of the Java© programming language specifically and object-oriented programming in general.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Comfort
Maybe you just want to do it yourself. Businesspeople in the industry who have grown up around open source often comment that duplication of effort, or "reinventing the wheel," is not time well spent. I rarely hear this from programmers. When people hear about KDE and GNOME, or Linux and BSD, or even more esoteric arguments about which window manager to use, inevitably someone will chime in, "Obviously, they had a lot of time on their hands. Otherwise, why would they have started from scratch?"
The implication is that the programmers have somehow wasted time. When I choose to reimplement some technology or program, I know what I'm doing, and even if it is a "waste" of time or duplication of effort, I think of it as practice. And when I can enjoy the luxury of implementing from scratch, I really like the results, because they're all mine and what I've developed works exactly the way I want it to.
Business, of course, is interested in productive developers, and productive developers don't rewrite things, right? No, not necessarily. People rewrite code all the time. The more-informed companies recognize that this type of thing is often inevitable, and the best and most resourceful encourage this kind of mental knife sharpening, because it leads to better developers and better code. Given the time, programmers often prefer to learn from other people's code without actually using the code, and if open source ends up as one big repository of example code, I call that a success.
Also, computers change. Computers, languages, compilers, and operating systems change so quickly that a periodic rewrite of some code becomes vital, from a performance perspective. To take advantage of the newest processors, architectures, and other advancements, a recompile will certainly be required and will likely expose issues with your code (architecture changes lead to this directly).
But people are using libraries, code, and examples from open source code, copying them into their codebases rapidly. Certainly this happens. Don't let my counter cases fool you. It is a rare codebase that doesn't involve some open source software, whether it is merely in the form of a standard library or a widget library, or is full of the stuff. This is by design; if every program had to write every instruction down to the operating system, or the machine itself, there would be no programs. The iterative building process, programs on top of libraries on top of the operating system, is so productive that I can't imagine someone ignoring it. Even for the smallest embedded systems, designers are using the GNU compilers to create great programs for their devices: compile, flash, and go.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Distributed Development
Distributed development is more than just a fad or even a trend. Organizations and companies large and small are using diverse, globally distributed teams to develop their software. The free software development movement showed the world how to develop internationally. Well before SourceForge.net became a site that every programmer had heard of, projects working together over the Internet or far-flung connected corporate networks developed much of the software that we use today.
In fact, the tools they developed to do that are now considered the baseline standard for developerd everywhere. What company in its right mind doesn't mandate that its programmers use some form of version control and bug tracking? I ask this rhetorically, but for a long time in the software business, you couldn't make this assumption. Small development shops would back up their data, for sure, but that's not version control.
Distributed development is about more than just version control. It's also about communications and bug tracking and distribution of the end result of software.
Programming is an inherently incremental process. Code, then build, then test. Repeat. Do not fold, spindle, or mutilate. Each step requires the developer to save the program and run it through a compiler or interpreter. After enough of these cycles, the program can do a new thing or an old thing better, and the developer checks the code into a repository, preferably not on his machine. Then the repository can be backed up or saved on a hierarchical storage system. Then, should a developer's workstation crash, the worst case is that the only work lost is that done since the last check-in.
What is actually stored from check-in to check-in is the difference from one version to the next. Consider a 100-line program, in which three lines in a program read:
for (i=1; i < 1; i++) {
    printf("Hello World\n");
}
and one link needs to be changed to:
for (i=1; i < 100; i++) {
    printf("Hello to a vast collection of worlds!\n");
}
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Collaborative Development
You have a developer in Tokyo, a team in Bangalore, a team in Zurich, and a shop in Seattle, all working on the same codebase. How can you possibly keep the development train from coming off the rails? Communication!
One might imagine that only now, with the advent of IM and VoIP, can developers keep up with each other. In fact, developers have stayed in touch in something approximating real time since the early days of Unix, when they began to have a great variety of communications tools to use. Early on, two developers on the same machine used the Write or Talk Unix programs, which allowed for a simple exchange of text between users. This grew into Internet Relay Chat (IRC) and then Instant Messenger (IM).
Email itself plays the most important role in development. It is the base packet of persistent knowledge that distributed developer teams have. Wikis are also taking hold as repositories of information.
Strangely (to nondevelopers) voice simply hasn't caught on as a terrific tool for ongoing developer communications. While a regular conference call is useful for keeping everyone moving in the same direction, the idea of vocal input while developing would drive many coders away screaming. The phone isn't evil, but maintaining an uninterruptible flow can be very important to developer productivity. Phones also do not create a logfile or other transcript that can be referred to later. Don't take my experiences for gospel here. Read the book Peopleware for more information about this. Everywhere I've ever worked, the one constant has been developers wearing headphones, but listening to music, not other developers yammering in their ears.
The online site SourceForge.net is the largest concentration of open source projects and code on the planet. SourceForge boasts some 100,000 projects and 1 million registered developers, and people use its integrated version control, project web hosting, file release mechanism, bug control, and mailing lists to write a vast amount of software. Pulling together these features on a free platform for open source developers proved to be a revolutionary concept. Before, people were left implementing this themselves with Bugzilla (a bug-tracking mechanism) and CVS or some other version control/bug-tracking facilities.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Software Distribution
While free software developers know how to code, what about getting the code in front of the user? In the early days of the Free Software Foundation (FSF), the answer was to send out tapes and disks to users who wanted the tools, for a reasonable fee. Now that so many people have connections to the Internet, boxed software is beginning to show its age, but software producers are really just now learning from open source how to distribute software in this way.
When you compile a piece of software, you sometimes end up relying on libraries that you must call from your program to do some task. If you try to run the program without the expected complement of libraries, it cannot run or it may run poorly. Open source developers have created some very smart packaging and installation systems and filesystem methods that can make this a more tractable problem. Once they created these packaging systems and combined them with the Internet, they got online updating. The irony is that, in a lot of ways, Linux and Unix were schooled in this by Windows. A common complaint regarding Linux when comparing it to Windows and OS X is that software can be very difficult to install. One could argue that Windows isn't all that easy to install either, but since Windows is preinstalled on most computers, this is an argument that often falls on deaf ears.
I don't think Linux developers have learned to do installation well yet. There are some standouts, but for the most part, installation ease is still a work in progress. One thing free and proprietary share is the appreciation for and development of online updating systems. This is something Linux distributions get very right. In short, once Linux is installed on your machine, it can be very easy to keep it up to date.
Online updating is a terrific way of getting software onto your machines. More importantly, it is a terrific way to maintain a secure system over time. Since Linux distributions don't have to worry about software license ownership, it is very easy for the software to determine whether to download a patch or fix, and thus many Linux distributions have systems to facilitate this. Proprietary software development houses such as Microsoft are still trying to figure this out. It is a hard problem when you mix it with licensing concerns. Additionally, when it's done wrong, you can literally crash thousands, or in the case of Microsoft and Apple, millions of machines, so it is really critical to do well. That the Debian and Fedora Core Linux distributions do this at all is quite a feat.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Proprietary Software Development Has Changed Open Source
Open source isn't magic, and developers aren't magicians. No developer is immune to security problems and bugs creeping into his code.
Free, open, proprietary, closed....Bugs happen. I think open source means fewer bugs, and people have written tens of thousands of words explaining how they agree or disagree with me. One thing I know I'm right about is that both kinds of code have bugs. Bugs persist longer in closed codebases, and their closed nature keeps bugs persistent.
If I may paraphrase Socrates, "An unexamined codebase is dead," and by dead I mean killed by the hostile environment that is viruses, worms, crackers, and Trojans. Like bugs, security flaws happen in both free and closed software. As a project matures, it must assemble a mantle of testing and quality assurance (QA) techniques that are vital to its ongoing health. I think open source development has learned much from the processes that proprietary software development houses have come up with to support their paying customers.
As projects mature, so do the testing suites around them. This is a truism for free and for closed software codebases, but the research around this originated in commercial software/hardware and in academia, and open source software has been a ready consumer of this information. The most popular talk I attended lately was in unit testing for Python at the O'Reilly Open Source Conference. The room was packed, with people sitting in the aisles. Testing is huge and is required for any project, free or not.
Scaling is hard. Whether we're talking about development group size, bandwidth, space, or whatever, scaling any programming project is nontrivial.
Software development has its limits. Product teams can't grow too fast or too large without one of two things happening—either disintermediating technology or project ossification. Fred Brooks's seminal book, The Mythical Man-Month, covered this in depth, and the existence of F/OSS development methodologies doesn't change that. In fact, the tools and changes free software has brought to prominence are all around disintermediation and disconnected collaboration.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Some Final Words
While open source software is about freedom and licenses, it is nonetheless true that open source costs less, under many circumstances, than proprietary software. This is an important aspect of free software. Additionally, it has to be cost competitive against other free products, just as software that costs money must compete against an open source/free offering.
When I say "competes against other free products," I'm talking about pirated copies of Windows, Office, SQL Server, Oracle, and many others competing against Linux, OpenOffice, MySQL, Postgres, and other best-of-breed free software applications. These applications are doing very well in environments that have little regard, legally or culturally, for software licenses.
Free things have a velocity all their own, and people forget that. I'll leave you with a little anecdote from when I was working for a large law firm in Washington, DC. I was still in college studying computer science, and I ran the law firm's email network during the day. This was 1996 or so, and TCP/IP was clearly the big winner in the network format wars versus NetBIOS and SNA, to a degree that no one could have appreciated. I was in the elevator with one of the intellectual property attorneys at the firm—a fairly technical guy—when he said something like: "You know, if TCP/IP had been properly protected and patented, we could have rigged it so that every packet cost money; they really missed the boat on that one."
Where would the Internet be if this was true? I don't know, but I do know one thing: the Internet would not be running TCP/IP. So, enjoy the freedom of open source software. It is there for you!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: A Tale of Two Standards
Jeremy Allison
It was the best of protocols, it was the worst of protocols, it was the age of monopoly, it was the age of Free Software, it was the epoch of openness, it was the epoch of proprietary lock-in, it was the season of GNU, it was the season of Microsoft, it was the spring of Linux, it was the winter of Windows....
Samba is commonly used as the "glue" between the separate worlds of Unix and Windows, and because of that, Samba developers have to intimately understand the design and implementation decisions made in both systems. It is no surprise that Samba is considered one of the most difficult Free Software projects to understand and to join, outclassed in complexity only by the voodoo black art of Linux kernel development. Samba really isn't that hard, however, once you look at the different standards implemented in the two systems (although some of the decisions in Windows can cause raised eyebrows).
In developing Samba, we're creating a bridge between the most popular standards currently deployed in the computing world: the Unix/Linux standard of POSIX and the Microsoft-developed de facto standard of Win32. In this chapter, I will examine these two standards from an application programmer's perspective. In doing so, I thought it might be instructive to look at the reasons why each of them exists, what the intention for creating the particular standard might have been, and how well they have stood the test of time and the needs of programmers. A historical perspective is very important, as we look to the future and decide what standards we should encourage governments and businesses to support, and what effect this will have on the software landscape in the early 21st century.
Standard: (noun) A flag, banner, or ensign, especially. An emblem or flag of an army, raised on a pole to indicate the rallying point in battle.
POSIX was named (like many things in the Unix software world) by Richard Stallman. It stands for Portable Operating System Interface-X, meaning a portable definition of a Unix-like operating system API. The reason for the existence of the POSIX standard is interesting and lies in the history of the Unix family of operating systems.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The POSIX Standard
POSIX was named (like many things in the Unix software world) by Richard Stallman. It stands for Portable Operating System Interface-X, meaning a portable definition of a Unix-like operating system API. The reason for the existence of the POSIX standard is interesting and lies in the history of the Unix family of operating systems.
As is commonly known, Unix was created in 1969 at AT&T Bell Labs by Ken Thompson and Dennis Richie. Not originally designed for commercialization, the source code was shipped to universities around the world, most notably Berkeley in California. One of the world's first truly portable operating systems, Unix soon splintered into many different versions as people modified the source code to meet their own requirements. Once companies like Sun Microsystems and the original, prelitigious SCO (Santa Cruz Organization) began to commercialize Unix, the original Unix system call API remained the core of the Unix system, but each company added proprietary extensions to differentiate their own version of Unix. Thus began the first of the "Unix wars" (I'm a veteran, but I don't get disability benefits for the scars they caused). For independent software vendors (ISVs), such proprietary variants were a nightmare. You couldn't assume that code that ran correctly on one Unix would even compile on another.
During the late 1980s, in an attempt to create a common API for all Unix systems, and fix this problem, the POSIX set of standards was born. Because no one trusted any of the Unix vendors, the Institute of Electrical and Electronics Engineers (IEEE) shepherded the standards process and created the 1003 series of standards, known as POSIX. The POSIX standards cover much more than the operating system APIs, going into detail on system commands, shell scripting, and many other parts of what it means to be a Unix system. I'm only going to discuss the programming API standard part of POSIX here because, as a programmer, that's really the only part of it I care about on a day-to-day basis.
Few people have actually seen an official POSIX standard document, as the IEEE charges money for copies. Back before the Web became really popular, I bought one just to take a look at the real thing. It wasn't cheap (a few hundred dollars, as I recall). Amusingly enough, I don't think Linus Torvalds ever read or referred to it when he was creating Linux; he used other vendors' references to it and manpage descriptions of what POSIX calls were supposed to do.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
First Implementation Past the Post
Any application program dealing with multiple access to files has to deal with file locking. File locking has several potential strategies, ranging from the "lock this file for my exclusive use" method, to the "lock these 4 bytes at offset 23 as I'm going to be reading from them soon" level of granularity. POSIX implements this kind of functionality via the fcntl() call, a sort of jack-of-all-trades for manipulating files (hence "fcntl file control"). It's not important to know exactly how to program this call. Suffice it to say that a code fragment to set up such a byte range lock looks something like this:
int fd = open("/path/to/file", O_RDWR);
Now, set up the struct flock structure to describe the kind of byte range lock we need:
int ret = fcntl(fd, F_SETLKW, &flock_struct);
If ret is zero, we got the lock. Looks simple, right? The byte range lock we got on the region of the file is advisory. This means that other processes can ignore it and are not restricted in terms of reading or writing the byte range covered by the region (that's a difference from the Win32 way of doing things, in which locks are mandatory; if a lock is in place on a region, no other process can write to that region, even if it doesn't test for locks). An existing lock can be detected by another process doing its own fcntl() call, asking to lock its own region of interest. Another useful feature is that once the file descriptor open on the file (int fd in the previous example) is closed, the lock is silently removed. This is perfectly acceptable and a rational way of specifying a file locking primitive; just what you'd want.
However, modern Unix processes are not single threaded. They commonly consist of a collection of separate threads of execution, separately scheduled by the kernel. Because the lock primitive has a per-process scope, this means that if separate threads in the same process ask for a lock over the same area, it won't conflict. In addition, because the number of lock requests by a single process over the same region is not recorded (according to the spec), you can lock the region 10 times, but you need to unlock it only once. This is sometimes what you want, but not always: consider a library routine that needs to access a region of a file but doesn't know if the calling processes have the file open. Even if an open file descriptor is passed into the library, the library code can't take any locks. It can never know if it is safe to unlock again without race conditions.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Future Proofing
One of the great successes of POSIX is the ease in which it has adapted to the change from 32-bit to 64-bit computing. Many POSIX applications were able to move to a 64-bit environment with very little or no change, and the reason for that is abstract types.
In contrast to the Win32 API (which even has a bit-size dependency in its very name), all of the POSIX interfaces are defined in terms of abstract datatypes. A file size in POSIX isn't described as a "32-bit integer" or even as a C-language type of unsigned int, but as the type off_t. What is off_t? The answer depends completely on the system implementation. On small or older systems, it is usually defined as a signed 32-bit integer (it's used as a seek position so that it can have a negative value), and on newer systems (Linux, for example) it's defined as a signed 64-bit integer. As long as applications are careful to cast integer types only to the correct off_t type and use these for file-size manipulation, the same application will work on both small and large POSIX systems.
This wasn't done all at once, because most commercial Unix vendors have to provide binary compatibility to older applications running on newer systems, so POSIX had to cope with both 32-bit file-sized applications running alongside newer 64-bit-capable applications on the new 64-bit systems. The way to make this work was determined by the Large File Support working group, which finished its work during the mid-1990s.
The transition to 64 bits was seen as a three-stage process. Stage one was the original old 32-bit applications; stage two was seen as a transitional stage, where new versions of the POSIX interfaces were introduced to allow newer applications to explicitly select 64-bit sizes, and stage three was where all the original POSIX interfaces default to 64-bit clean.
As is usual in POSIX, the selection of what features to support was made available using compile-time macro definitions that could be selected by the application writer. The macros used were:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Wither POSIX?
The POSIX standard has not been static; it has managed to evolve (although some would argue too slowly) over time. A major step forward was the establishment of the Single Unix Specification (SUS), which is a superset of POSIX developed in 1998 and adopted by all the major Unix vendors and shepherded by the Unix standards body, "the Open Group." It was a great leap forward when this specification was finally made available for free on the Web from the Open Group web site at http://www.unix.org. It certainly saved me from having to hunt down cheap POSIX specifications in secondhand bookshops in Mountain View, California.
The expanded SUS now covers such issues as real-time programming, concurrent programming via the POSIX thread (pthread) interfaces, and internationalization and localization, but unfortunately it does not cover file Access Control Lists (ACLs). Sadly, that specification was never fully agreed on, and so has never made it into the official documents. Interestingly enough, the SUS also doesn't cover the GUI elements, because the history of Unix as primarily a server operating system has meant that GUIs have never been given the priority necessary for Unix to become a desktop system.
Looking at what happened with ACLs is instructive when considering the future of POSIX and the SUS. Because ACLs were sorely needed in real-world environments, individual Unix vendors, such as SGI, Sun, HP, and IBM, added them to their own Unix variants. But without a true standards document, they fell into their old evil ways and added them with different specifications. Then along came Linux....
Linux changed everything. In many ways, the old joke is true: Linux is the Unix defragmentation tool. As Linux became more popular, programs originally written for other Unixes were first ported to it, and then after a while were written for it and then ported to other platforms. This happened to Samba. Sun's SunOS on a SPARC system was, at first, our primary user platform, but after five years or so we rapidly migrated to Linux on Intel x86 systems. We now develop almost exclusively on Linux, and from there port to other Unix systems.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Win32 (Windows) Standard
Win32 was named for an expansion of the older Microsoft Windows interface, renamed the Win16 interface once Microsoft was shipping credible 32-bit systems. I have a confession to make. In my career, I completely ignored the original 16-bit Windows on MS-DOS. At that time, I was already working on sane 32-bit systems (68000 based), and dealing with the original insane 8086 segmented architecture was too painful to contemplate. Win32 was Microsoft's attempt to move the older architecture beyond the limitations of MS-DOS and into something that could compete with Unix systems—and to a large extent Microsoft succeeded spectacularly.
The original 16-bit Windows API added a common GUI on top of MS-DOS, and also abstracted out the lower-level MS-DOS interfaces so that application code had a much cleaner "C" interface to operating system services (not that MS-DOS provided many of those). The Win32 Windows API was actually the "application" level API (not the system call level; I'll discuss that in a moment) for a completely new operating system that would soon be known as Windows NT ("New Technology"). This new system was designed and implemented by Dave Cutler, the architect of Digital Equipment Corporation's VMS system, long a competitor to Unix. It does share some similarities with VMS. The interface choice for applications was very interesting, sitting on top of a system call interface that looks like Figure 3-2.
Figure 3-2: Architecture of the Win32 API
The idea behind the Windows NT kernel was that it could host several "subsystem" system call interfaces, providing completely different application behavior from the same underlying kernel. It was meant to be a completely customizable operating system, providing different kernel "personalities" any ISV might require. The DOS subsystem and the (not-shown) 16-bit Windows subsystem were essential, as they provided backward compatibility for applications running on MS-DOS and 16-bit Windows; the new operating system would have gathered little acceptance had it not been able to run all the old MS-DOS and Windows applications. The OS/2 subsystem was designed to allow users of text mode OS/2 applications (which was at one time a Microsoft product) to port them to Windows NT.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Tar Pit: Backward Compatibility
Now, as an example of where Win32 got things spectacularly wrong, I want to look at a horror from the past that unfortunately got added into the Win32 interfaces due to the MS-DOS heritage. My pet hate with Win32 is the idea of "share modes" on open files. In my opinion, this one single legacy design decision has probably done more than any other to hold back the development of cluster-aware network filesystems on Win32 systems.
Under POSIX, an open() call is very simple. It takes a pathname to open, the way in which you want to access or create the file (read, write, or both with various create types), and a permission mask that gets applied to files you do create. Under Win32, the equivalent call, CreateFile(), takes seven parameters, and the interactions among them can be ferociously complex. The parameter that causes all the trouble is the ShareMode parameter, which can take values of any of the following constants OR'ed together:
FILE_SHARE_READ
Allow others to open for read.
FILE_SHARE_WRITE
Allow others to open for write.
FILE_SHARE_NONE
Don't allow any other opens.
FILE_SHARE_DELETE
Allow open for delete intent.
To make these semantics work, any Windows kernel dealing with an open file has to know about every other application on the system that might have this file open. This was fine back in the single-machine MS-DOS days, when these semantics were first designed, but it is a complete disaster when dealing with a clustered filesystem in which a multitude of connected file servers may want to give remote access to the same file, even if they serve out the file read-only to applications. They have to consult some kind of distributed lock management system to keep these MS-DOS-inherited semantics working. While this can be done, it complicates the job enormously and means cluster communication on every
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
World Domination, Fast
I've heaped enough opprobrium on Win32. Let's give it a break and consider something the designers really did get right, and one of the advantages it has over POSIX. I'm talking about the early adoption of the Unicode standard in Win32. When Microsoft was creating Win32, one of the things it realized was that this couldn't just be another English-only, American- and European-centric standard. It had to be able to not only cope with, but also encourage, applications written in all world languages (never accuse Microsoft of thinking small in its domination of the computing world).
Given those criteria, its adoption of Unicode as the native character set for all the system calls in Win32 was a stroke of genius. Even though the Asian countries aren't particularly fond of Unicode, because it merges several character sets they consider separate into one set of code points, Unicode is the best way to cope with the requirements of internationalization and localization in application development.
To allow older MS-DOS and Win16 applications to run, the Win32 API is available in two different forms, selectable by a compiler #define of -DUNICODE (it also helps if you own the compiler market for Windows, as Microsoft does, as you can standardize tricks like this). The older code-page-based applications call Win32 libraries that internally convert any string arguments to 16-bit Unicode and then call the real Win32 library interface, which, like the Windows NT kernel, is Unicode only.
In addition, Win32 comes with a full set of library interfaces to split out the text messages an application may need to display into resource files so that ISVs can easily have them translated for a target market. This eases the internationalization and localization burdens considerably for vendors.
What is more useful, but not as obvious, is that making the Win32 standard natively use Unicode meant developers were immediately confronted with the requirements of multilingual code development. Many applications written in English-speaking (or Western European eight-bit character set-compatible) countries are badly written, making the assumption that a character will always fit within one byte. The early versions of Samba definitely made that mistake and retrofitting multibyte character set handling into old code is a real bear to get right. I know, because I was the person who first had to work on this for Samba (later I got some much-needed help from Andrew), so I may be a little touchy on this subject.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Wither Win32?
As with POSIX, the Win32 standard has not remained static over time. Microsoft has continued to develop and extend it, and has the advantage that anything it publishes immediately becomes the "standard," as is the case with all single vendor-defined standards.
However, Microsoft is attempting to deemphasize Win32 as it moves into its new .NET environment and the new world of "managed code." Managed code is code running under the control of an underlying virtual machine (called the Common Language Infrastructure, or CLI, in .NET) and can be made to prevent the direct memory access that is the normal mode of operation of an API designed for C coding, such as Win32 or POSIX. Free Software is also making a push into this area, with the Mono project, which implements the Microsoft C# language and .NET-managed code environment on Linux and other POSIX systems.
Even if Microsoft is as successful as it hopes to be in pushing ISV programmers to convert to .NET and managed code using its new C# language, the legacy of applications developed in C using the Win32 API will linger for decades to come. ISV programmers are an ornery lot, especially people who have mastered the Win32 API, due to its less-than-complete documentation.
What seems to happen over the years is that experienced Win32 programmers gain a sort of folk knowledge about the Win32 APIs—i.e., how they really work versus what the documentation says. I often hang out on Usenet Windows discussion groups, and the attitudes of the experienced Windows programmers are very interesting: they usually hate telling novices how stuff works. It's almost as if having learning Windows is a badge of honor, and they don't want to make earning that too easy for the neophytes. They exude an air of "they must suffer as I did."
As Microsoft becomes less interested in Win32 with the release of its new Longhorn Windows client and the move to managed code, is it possible for Microsoft to lose control of it? The POSIX standard is so complete because it was designed to allow programmers reading the standards documents to re-create a POSIX system from scratch. The Win32 standard is nowhere near as well documented as that. However, there is hope in the Wine project, which is attempting to re-create a version of the Win32 API that is binary compatible with Windows on Intel x86 systems. Wine is, in effect, a second implementation of the Win32 system, making it closer to a true vendor-independent standard. Efforts taking place at companies such as CodeWeavers and Transgaming Technologies are very promising; I just finished playing the new Windows-only game Half-Life 2 on my desktop Linux system, using the Wine technology. This is a significant achievement for the Wine code and bodes well for the future.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Choosing a Standard
Between two evils, I always like to take the one I've never tried before.
Mae West
So, what should we choose when examining what standards to support and develop applications for? What should we recommend to businesses and governments that are starting to look closely at the open source/free software options available?
It's important that businesses and governments selecting standards-based products pay attention to open standards. No more of the Microsoft Word .DOC format standard (which suffers from the same problem as Win32 in terms of it being single-vendor controlled). No de facto vendor standards, no matter how convenient. They need to select standards that are at the same level as POSIX—namely, standards to the level that other implementations can be created from the documentation. It's simple to tell when a standard meets that criterion because other implementations of it exist.
The interesting thing is that both POSIX and Win32 standards are now available on both systems. On Linux, we have the POSIX standard as native, and the Wine project provides a binary-compatible layer for compiled Win32 programs that can run many popular Win32 applications. Perhaps more interestingly for programmers, the Wine project also includes a Linux shared library, winelib, which allows Win32 applications to be built from source code form on POSIX systems. What you end up with is an application that looks like a native Windows application, but can be run on non-Intel platforms; something that early versions of Windows NT used to support, but now is restricted to x86-compatible processors. Taking your Win32 application and porting it using winelib is an easy way to get your feet wet in the POSIX world, although it won't look like a native Linux application (this may be a positive thing if your users are used to a Windows look and feel).
If you've already gone the .NET and C# route, using the Mono project may enable your code to run on POSIX systems.
On Windows, there is now a full POSIX subsystem, supported by Microsoft and available for free. Earlier I alluded to Microsoft's reluctance to release information on how to create new subsystems for the Windows NT kernel, but it turns out that earlier in its history Microsoft was not so careful. A small San Francisco-based company, Softway Systems, licensed the documentation and produced a product called OpenNT (later renamed Interix), which was a replacement for Microsoft's originally crippled POSIX subsystem. Unfortunately, OpenNT didn't sell very well; someone cruelly referred to it as having "all the application availability of Linux, with the stability of Windows." As the company was failing, Microsoft bought it (probably to bring the real gem of the Windows kernel subsystem interface knowledge back in-house) and used it to create its Services for Unix (SFU) product. SFU contains a full POSIX environment, with a software development kit allowing applications to be written that have access to networking and GUI APIs. The applications written under it run as full peers with the mature Win32 applications, and users can't tell the difference.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Open Source and Security
Ben Laurie
More than two years ago, in a fit of frustration over the state of open source security, I wrote my first and only blog entry (for O'Reilly's Developer Weblogs):
June and July were bad months for free software. First Apache chunked encoding vulnerability, and just when we'd finished patching that, we get the OpenSSH hole. Both of these are pretty scary—the first making every single web server potentially exploitable, and the second makes every remotely managed machine vulnerable.
But we survived that, only to be hit just days later with the BIND resolver problems. Would it ever end? Well, there was a brief respite, but then, at the end of July, we had the OpenSSL buffer overflows.
All of these were pretty agonising, but it seems we got through it mostly unscathed, by releasing patches widely as soon as possible. Of course, this is painful for users and vendors alike, having to scramble to patch systems before exploits become available. I know that pain only too well: at The Bunker, we had to use every available sysadmin for days on end to fix the problems, which seemed to be arriving before we'd had time to catch our breath from the previous one.
But I also know the pain suffered by the discoverer of such problems, so I thought I'd tell you a bit about that. First, I was involved in the Apache chunked encoding problem. That was pretty straightforward, because the vulnerability was released without any consultation with the Apache Software Foundation, a move I consider most ill advised, but it did at least simplify our options: we had to get a patch out as fast as possible. Even so, we thought we could take a little bit of time to produce a fix, since all we were looking at was a denial-of-service attack, and let's face it, Apache doesn't need bugs to suffer denial of service—all this did was make it a little cheaper for the attacker to consume your resources.
That is, until Gobbles came out with the exploit for the problem. Now, this really is the worst possible position to be in. Not only is there an exploitable problem, but the first you know of it is when you see the exploit code. Then we really had to scramble. First we had to figure out how the exploit worked. I figured that out by attacking myself and running Apache under gdb. I have to say that the attack was rather marvelously cunning, and for a while I forgot the urgency of the problem while I unravelled its inner workings. Having worked that out, we were in a position to finally fix the problem, and also, perhaps more importantly, more generically prevent the problem from occurring again through a different route. Once we had done that, it was just a matter of writing the advisory, releasing the patches, and posting the advisory to the usual places.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Many Eyes
Content preview·Buy PDF of this chapter|