One of my favorite presentations from last year was Theo Schlossnagel's presentation on Whack-a-mole, so when I saw him giving a full tutorial on scalability this year, I had to go and check it out. And this year I wasn't disappointed either -- Theo presented a solid tutorial that exuded his practical experience in this field. Of course its impossible to summarize four hours of a tutorial in a blog entry, so I'll try to summarize Theo's three simple rules that he applied repeatedly in his presentation:
Theo kept reaching back to these points throughout his talk -- other points he stressed repeatedly were the use of a clearly established release procedure. Unless the team that rolls out new software onto production servers has documented procedures to follow, mistakes will be made. And as systems grow in size, the likelihood of fatal errors increases dramatically. To spare yourself from this fate, document your release procedure and use a version control system to keep track of everything you do. Again, seems like common sense, but often it is not.
Aside from general rules, Theo covered a number of open source solutions that can eliminate the need for expensive dedicated hardware boxes like fail-over switches and load balancers. My favorite example was the use of the Whack-a-mole toolkit for when one machine fails. The toolkit allows an architecture to determine when a server fails and automatically reshuffle the work that the dead server covered. Using whack-a-mole allows people to save money by not buying expensive redundant/fail-over systems and only use commodity hardware. Another great tool that Theo covered is the spread toolkit that allows multiple machines to easily communicate in a coherent manner. Spread allows machines to create a communication channel that is shared and sequenced between all the computers that have joined that channel. Each listener in the channel receives all of the messages posted to the channel in the same order as everyone else -- this is an important feature that allows this toolkit to be used in mission critical high availability setups. My favorite application of this toolkit is to create a multi-server logging facility, where multiple machines write their log files to a spread channel and one machine writes a correct interleaved log file for all the machines. Theo's tutorial set the stage for people who are facing scalability issues -- he presented a lot of thoughts and hard earned experiences from his extensive past. Scaling issues are generally very dependent on the system, and having a general set of rules to consider has given me a framework in which to consider scalabilities in my own projects.Robert Kaye is the Mayhem & Chaos Coordinator and creator of MusicBrainz, the music metadata commons.
oreillynet.com Copyright © 2006 O'Reilly Media, Inc.