[Editor's Note: Peter Drescher's last article for O'Reilly Digital Media ("Could Ringtones BE More Annoying?!") was such a hit that we asked him to expand on the concept. The following article is an edited transcript of his session "Creating Audio for Mobile Games," which he presented last month at the Game Developers Conference. To simulate the live experience, we've sprinkled audio and video clips throughout the story, corresponding with the ones Drescher played from his T-Mobile Sidekick during the presentation.]
Good afternoon. My name is Peter Drescher, and I am currently sound designer at Danger, Incorporated, makers of the Hiptop mobile internet device, also known as the T-Mobile Sidekick. I used to be a road-dog bluesman piano player until about 15 years ago, when I managed to get a job doing audio for multimedia. Since then, and through no planning on my part, I've become something of an expert on making little, tiny itsy-bitsy teeny-weeny audio files sound like, well ... anything at all.
First I did soundtracks for After Dark screen savers that shipped on floppy disks. Then it was music stamps for the General Magic device, web audio for 56k modems, and these days, ringtones and game audio for mobile devices. And that's what I'm talking about today.
Now, when I say "mobile device," I'm talking about a cell phone or "smart" phone, but not the PSP or Game Boy. In the context of this presentation, a mobile device is defined as something that contains a radio for transmitting and receiving voice and other data over a cell phone network. We are talking about devices with fairly sophisticated operating systems, like Nokia and Motorola cell phones, or devices like the Treo and the Sidekick, because we are concerned with interactive game audio produced by the device OS (as opposed to phone conversations, FM radio broadcasts, or iPod libraries).
Figure 1. Peter Drescher plays a game while listening to music on his custom black nanoHiptop.
When making sound effects and background music for any game, you're trying to enhance gameplay and create a fun, exciting audio experience for the user. The last thing you want to do is be annoying, irritating, painful, or trite.
You really don't want to be repetitive.
You really don't want to be redundant.
You really don't want to play the same sound over and over again.
You really get the idea.
Unfortunately, this is frequently exactly what happens when your entire audio budget is less than 100 kilobytes, as is mostly the case in resource-constrained situations like the current mobile environment. Given that one second of "CD quality" audio is about 86K, you're going to have to use some tricks of the trade if you want to have more than a single sound effect in your game.
The most important trick is to be as ruthlessly efficient as possible. You want to squeeze every last drop of variation out of each and every byte of audio data at your disposal. Repetition is the enemy, compression is your ally, and creative use of limited resources is your battle cry. If you're making sound effects, this means low-resolution, highly compressed samples used in multiple ways. If you're writing music, it means MIDI. Using the techniques I'll be demonstrating here, it is possible to produce interesting, interactive, non-repetitive, evolving soundtracks for mobile games using absurdly tiny files. The real difficulty comes when trying to make these little tiny pieces of digital audio crap actually sound good. But it can be done, and here's how we did it.
Cheese Racer is a game, based on Rally X, that uses a number of techniques to produce lots of sound and music in very little space, including:
Figure 2. This MIDI sequence contains all of the tracks used for the Cheese Racer game. It plays instruments from the built-in General MIDI soundbank, along with custom samples contained in an RMF file. (Click to enlarge.)
Let's look at the music soundtrack first. Figure 2 is a screen shot of the MIDI sequence used in the game, displaying the complete track layout. Since this game was designed to run on the Hiptop, which uses the Beatnik Audio Engine as part of the OS, we were able to use an RMF file to create the soundtrack. This means that during gameplay, the MIDI sequence pictured here is rendered on instruments from the internal General MIDI soundbank, and on custom software instruments specifically designed for the game.
For example, the top two tracks play custom samples of drum loops:
Next we have a group of three tracks, playing bass, melody, and chords:
In the code for the first level, we've defined ten different combinations of tracks: percussion + bass, percussion + bass + melody, bass + melody + chords, etc. You can hear what it sounds like by clicking the movie clip in Figure 3.
Figure 4. Every time you play the game, the soundtrack is different. (Click image to play movie clip.)
Figure 3. Notice how the mix changes every time the mouse gets a piece of cheese. (Click image to play movie clip.)
Because the music loop is 40 seconds long and we defined ten different combinations of tracks, each level contains more than six minutes of different music mixes. Of course, not every mix is played for 40 seconds—in fact, that's kind of the point. Because the mix changes depending on gameplay, the music will never play exactly the same way twice, thereby increasing variation and decreasing "ear fatigue."
But wait! There's more! In fact, there are two more sets of tracks playing two more styles of music, using the same tempo and percussion tracks as the first level. Therefore the entire game contains almost 20 minutes of various music mixes, using only 68K of compressed sample and MIDI data. Click on Figure 4 to hear a different performance. This time, notice the use of "bumpers" to smooth out transitions between mixes, and when moving to the next level. (Special thanks to Lucas Finklestein, Danger QA engineer and game player extraordinaire, for helping me film these sequences.)
Remember, you want as much variation as possible, and you really want to avoid hearing the same sound over and over again. This game contains only a single "trumpet fall" sound effect, but we modify the playback sample rate so that each time you pick up a piece of cheese, the sound is played at a different pitch ... any similarity to the 1960s Batman theme is entirely intentional:
We also randomly vary the pitch during gameplay, so there's no repeating pattern to it:
Figure 5. By randomly varying the pitch of the car horn beeps, we produced multiple sounds using only one file. (Click image to play movie clip.)
The sound tends to mask the transition from one mix to another, helping to create a more seamless audio experience. In Figure 5, you can hear the same kind of pitch-shifting effect applied to car horn beeps.
Another way to save space is using the "pitch it up, play it down" technique. Here's how it works: take your original, high-resolution sound effect and transpose it up an octave, halving the length. Then convert it to a low-resolution compressed format, like this:
Then in the game, play it down an octave:
Although the game sound might be a little crunchy, you've just cut the size of your file in half without losing too much audio fidelity. Obviously, the higher you pitch the sound, the "crunchier" the playback will become, but the technique can be used for custom instruments as well as sound effects, and is particularly effective on the tiny speakers in mobile devices.
Duke Ellington said about arranging, "Always write for your players." In other words, if you write a horn line you know your horn section can play well, then your arrangement will be well-played and your music will sound good, kinda by definition. The same is true when creating audio for mobile games: you have to write for your players, which in this case is usually a speaker the size of your thumbnail.
First of all, this means no bass—I mean, none, nada, fuggedaboutit. Don't write music that gets its power or groove from a deep funky bass line, because nobody's ever going to hear it. Snap of the snare drum—yes, boom of the kick drum— nnnnnot so much.
The same thing goes for sound effects. When making a car crash, pick a sound that has a lot of high end, then EQ the bottom right out. Applying a highpass filter at around 250Hz and maybe adding a bump around 3kHz will prevent low-end rumble from distorting the speaker, and make the sound pop where the speaker is most sensitive.
Figure 6. Waves L1 is an extraordinarily useful plugin that helps audio designed for cell phone speakers play loud and clear.
Another thing you're going to want to do is apply a liberal dollop of L1 to almost every sound you make (see Figure 6). L1 is a compressor/limiter plugin from Waves that squishes the peaks and raises the overall volume of a sound file; it is used to fill up as much of the available digital range as possible, as shown on the right side of Figure 6.
This does two good things. First, it gives you a strong signal so that your sound will play loud and clear on the tiny speaker, and second, it gives your downsampling and compression algorithms as much meat as possible to chew on while they decimate your audio data to save space. When processing a high-resolution, CD-quality sound effect down to an 11kHz, IMA 4:1 compressed sample, a soft sound like the one on the left side of Figure 6 is going to come out sounding a lot like static.
This doesn't mean you can't make good sounds on cell phones; you just have to write for your players. For mobile music, melody is king. For sound effects, well, there is one category of sound that cell phones are specifically designed to produce well--vocals. Voice-overs, screams, laughs, sneezes, sighs, grunts, groans, pretty much any sound a mouth can make is guaranteed to come through a cell phone speaker with some sort of fidelity. Here's a good example.
Bob is a game that uses a number of interactive audio techniques to create a varied soundtrack. The music mix changes as the player progresses through each level, the frequently played bounce sound effect is varied depending on how fast the sprite is falling, and Bob, being a loud happy guy, says things like "Yeah!" when he gets a power-up, "Whoa" when he falls too fast, and "All right" when it's time to play. And if these vocals sound a little familiar, yes, it's true, I admit it ... I am the voice of Bob.
Figure 7. The whimsical Bob is one of Danger's best selling games. (Click image to play movie clip.)
Notice how the music mix changed as we played the game. The bass line got faster, the electric piano came in, etc. Each level has six or seven different combinations of instruments and tracks that flow from one to the other as you continue forward, with a full mix playing when you complete the level. That way, the music is always changing, and there's a little reward at the end.
Figure 8. This view of the tile editor shows a section of one level with the bits for "Audio Mix #5" shown as a blue line.
This variation is accomplished by setting bits in the level map using an editor as shown in Figure 8. Each black dot represents some sort of special tile bit (bounce, enemy, heart, etc.) and the bit for "Audio Mix #5" is highlighted in blue. Therefore, when Bob passes through that thin blue line (and he always will at some point, because the game is a side-scroller), the music will change to mix #5. Later, if you don't fall into the bottomless pits at mid-screen, you'll hit the next audio zone, shown in Figure 8 as a line of black dots in the sky at the right, and then Mix #6 will play.
Figure 9. Why is this soundtrack called "the Secret Yanni Technique?" Click the image to play the movie clip.
Now we get to one of my favorite tricks of the trade, which I like to call the "Secret Yanni" technique for reasons that are too arcane to explain here. Basically, the idea is this: video games are not movies; there's no concept of "sync to picture" because you can't always predict when a sound effect will be played. But knowing this, you can use serendipity to your advantage, as we did in Bob with the sound effects for bonus points.
In order to save space, we wanted to play a few MIDI notes every time Bob picked up a heart. MIDI is an excellent (and in many cases, the only) choice for making sound in the mobile environment because it is an extremely efficient and flexible technology. But the question becomes, "Which notes do you play?" There's no way to know when Bob is going to pick up points, so there's no way to know when the bonus sound will play or how it will fit with the music. You want the effect to be "Yay!" but if the notes clash with the music, the effect is going to be more like "Ouch!"
The answer? Modal music. You restrict your soundtrack to a single mode, no key changes, no fancy chord progressions. Just keep it simple, baby—and in many cases, this is a good way to write background music anyway. For Bob, the chords are mostly ii-IV-V-I in C, and then I picked notes for the bonus points that fit the mode, using a pentatonic scale consisting of C, D, E, G, and A. If you want to get really clever, you can have the notes play at the same tempo as the music, or some polyrhythmic fraction of it. Arranging things in this manner will produce sound effects that almost never clash with the music, and sometimes will even seem to be part of it. (Click on Figure 9.)
Figure 10. Meme is a super-chimp, fighting his way past mad scientists and evil robots by hitting them with bananas. (Click image to play movie clip.)
One way to avoid the problems associated with background music is not to have any. Another Danger game, Meme, dispenses with music altogether, and uses mood-producing sounds instead. The "evil laboratory" level demonstrates the "pitch it up, play it down" technique to an extreme degree, and is used to solve a difficult problem: how do you create a low, creepy ambience using very little memory space? Traditionally, looping ambiences have to be fairly long (and therefore fairly large), otherwise you quickly hear the unnatural repetition of the same sound playing over and over. The other problem: How do you produce a Star Trek bridge-style rumble on a tiny cell phone speaker?
The answer, of course, is ... you don't. It's impossible. Here's what you do instead: Make a fairly high-pitched sound with a little blip in it and a little looped vibrato, like this:
Then make a custom RMF instrument out of it so you can play it at different pitches and layer it against itself. Then try playing it lower and lower until the sound starts to disappear, then bring it back up a little. That way, you get the lowest possible hum, the sound itself is very crunchy (which kinda works in this situation), and so it sounds something like Figure 10.
Man, that's a noisy game! You don't really miss the music much, do you? Other techniques demonstrated are the random pitching up and down of the banana whooshes, and prominent use of "vocal" sound effects (in this case, chimp noises) which punch through the tiny speaker loud and clear.
We also do the "single sample used in multiple ways" trick for the robot factories. Each time a robot is manufactured, there's this grating, grinding noise. It's the electrical-zap-played-down-an-octave machine:
Figure 11. Playing the zaps and pows at different pitches can produce interesting effects without using additional space. (Click image to play movie clip.)
During gameplay, the bottom end goes away because the speaker's too small to produce it, leaving only the higher-end grungy sound. Maybe it's not the sound effect you would have first designed for a robot factory, but it sorta kinda works and it doesn't use any extra space.
But my favorite use of that trick in this game is the cannon sound, which is the bullet played two octaves down:
This was totally unplanned; I was simply playing around with pitch-shifting and thought, "Cool, man, we got to use that!" Sometimes, happy accidents are your most valuable tool. Check out Figure 11.
I have a certain advantage in producing these kinds of interactive soundtracks because:
First, you'll have to see if your phone's operating system gives you access to the kind of APIs you'll need to implement things like pitch shifting and track muting. Some do, some don't. Then you'll need to check if the audio subsystem can even do what you want it to. If you're running Beatnik, you're probably golden; if not, well, good luck. The audio capabilities of cell phones and audio engines vary widely, from the completely rudimentary to the arcanely sophisticated.
To make good decisions about how to squeeze the maximum amount of sound out of the minimum amount of space, you'll want to design your audio taking into account the technical limitations of each platform your game is intended to run on. You'll need to know what file types are supported, at which resolutions, and what compression algorithms are available.
If you can run hi-res MP3s, you can pretty much make any sound you like, but if you're limited to, say, 8kHz IMA WAV files, you might want to consider using short, loud, uncomplicated sounds. These will translate better than complex noise, so forget that symphonic sample and use a single flute line instead. If the audio engine's output rate is 44k, rocking! Go for that screaming guitar solo, but if you're constrained to 11kHz, you're up against Nyquist, so you might be better off with a midrange piano solo. And no cell phone speaker is going to produce any bass whatsoever, so plan on using all tweet and no boom.
Most important, you'll need to know what your audio budget is—how much space is being allocated for sound. In most cases, if you want to make music, you're going to have to use General MIDI played on the internal wavetable, simply because it provides by far the most audio bang for your digital buck. But do yourself a favor and write for the basics—piano, horn, kick, snare, maybe some strings—and avoid things like the goblin pad, or the shakuhachi, and the other lesser-known patches. You have no idea how those instruments will sound on different devices and different soundbanks, whereas a piano is pretty much going to sound like a piano no matter where you go. In fact, solo piano music is usually a pretty safe bet in a General MIDI world, and I would suggest that being constrained to a single instrument didn't seem to hinder Bach's, Beethoven's, or Chopin's creativity.
Truth be told, none of the techniques I've described today are new or inventive. I am certainly not an innovative genius or a radical visionary, I've just been doing this stuff for a while. In fact, creating audio for mobile games these days is strikingly similar to producing audio for PC games in the '80s and doing web audio in the '90s. The same kind of audio problems and solutions will apparently arise in almost any resource-constrained, developing, competitive environment.
But since we've been here before (twice!), you think we would've learned a lesson or two. We'd know that closed, proprietary audio formats are bad and open standards are good.
We'd know that bandwidth bottlenecks will expand, and so we'd plan for scalability now.
We'd know that all DRM schemes are doomed, and that the best way to make a buck is to give the customer what they want, not criminalize what they're going to do anyway.
But most important, we'd know that the same techniques that have worked for us in the past will be useful today and in the future. In other words, you want interesting, well-produced soundtracks for your mobile games? Hire old game-audio guys!
By the way, these "lessons learned" are not mine, they are just a few from the 2004 Project Bar-B-Q Mobile Audio session.
But finally, my friends, I'm going to stand here and tell you that the mobile industry moves so fast that everything I've said today about creating soundtracks for mobile games is already complete and utter bullshit. And I'll tell you why: convergence.
The advent of music phones with gigabyte removable storage and broadband network connections is going to make mobile game music completely obsolete.
Really, think about it for a minute: games on your cell phones are not like games on your Xbox, or even your PSP. Given the bandwidth restrictions and the CPU usage, mobile games tend to be small, fun, time-killers. They're not 40-hour immersive environments like God of War; they're what you do while you're waiting for something else to happen in your life. And because of the technical limitations, and because they're cheap, and because, let's face it, folks, they're phones, not dedicated gaming platforms, mobile game soundtracks are going to be kinda low-res no matter what you do.
And this is all well and good when that's all you've got. But what happens when that same device, that cool little gadget that takes pictures, does email, surfs the Web, plays games, and—oh yeah—makes phones calls also contains six hours of your favorite music? Ask yourself what you would prefer to listen to while you kill a little time playing little game—some low-res MIDI soundtrack written by who knows who (and that would be me in this case), or that funky-cool Grammy-winning groove you just uploaded to your phone from iTunes? Come on, it's no contest.
And I know this because I've done it, and it's too damn cool for school. It's "roll your own game audio;" it's CheeseRacer with an Aerosmith soundtrack, or the Greenskeepers, or the New York Philharmonic, or whatever you want. I'm tellin' ya, it's the best thing since flavored toothpaste, and pretty soon aaalll the cool kids are going to be doing it. Some Motorola phones already have iTunes built in, and the cell phone carriers are all jumping on board the "music phone" bandwagon like it was headed for Gold Country. Personally, I just took my custom black Hiptop and slapped an iPod nano on the back with duct tape (because I'm a musician, and all musicians must use duct tape) and voilà! (See Figure 12.)
Figure 12.The nanoHiptop (Duct Tape Edition) is the essence of convergence: two refined tools fused into one.
Introducing the nanoHiptop (not a real product, not available in stores, not condoned by either Danger or Apple Computer, do not try this at home, may void warranty). Nonetheless, check it out, it's a cool concept, it works great—and it represents a paradigm shift in the way we're going to think about creating audio for mobile games.
The day after I made this presentation, I attended Brian Schmidt's Xbox 360 audio session at GDC, and after my head stopped spinning from the unbelievable power of the system (322 simultaneous voices of digital audio?!), I discovered that not only does the hardware have a port for plugging in your iPod, but all titles for the console are required to let gamers use their own playlists as background music. The game's background music stops, but the sound effects keep going and the iPod output is mixed in ... brilliant!
This may be the first time that high-end console games and low-end mobile games share a common functionality—allowing the user to customize their own audio experience. I'm tellin' ya, folks, it's the wave of the future.
Return to the digitalmedia.oreilly.com