BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


Unix Backup and Recovery
Unix Backup and Recovery By W. Curtis Preston
November 1999
Pages: 734

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Preparing for the Worst
One of the simplest rules of systems administration is that disks and systems fail. If you haven't already lost a system or at least a disk drive, consider yourself extremely lucky. You also might consider the statistical possibility that your time is coming really soon. Maybe it's just me, but I lost four laptop disk drives while trying to write this book! (Yes, I had them backed up.)
This chapter talks about developing an overall disaster recovery plan, of which your backup and recovery system will be just a part.
My father used to tell me, "There are two types of motorcycle owners. Those who have fallen, and those who will fall." The same rule applies to system administrators. There are those who have lost a disk drive and those who will lose a disk drive. (I'm sure my dad was just trying to keep me from buying a motorcycle, but the logic still applies. That's not bad for a guy who got his first computer last year, don't you think?)
Whenever I speak about my favorite subject at conferences, I always ask questions like, "Who has ever lost a disk drive?" or "Who has lost an entire system?" Actually, this chapter was written while at a conference. When I asked those questions there, someone raised his hand and said, "My computer room just got struck by lightning." That sure made for an interesting discussion! If you haven't lost a system, look around you . . . one of your friends has.
Speaking of old adages, the one that says "It'll never happen to me" applies here as well. Ask anyone who's been mugged if they thought it would happen to them. Ask anyone who's been in a car accident if they ever thought it would happen to them. Ask the guy whose computer room was struck by lightning if he thought it would ever happen to him. The answer is always "No."
While the title of this book is Unix Backup & Recovery, the whole reason you are making these backups is so that you will be able to recover from some level of disaster. Whether it's a user who has accidentally or maliciously damaged something or a tornado that has taken out your entire server room, the only way you are going to recover is by having a good, complete, disaster recovery plan that is based on a solid backup and recovery system.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
My Dad Was Right
My father used to tell me, "There are two types of motorcycle owners. Those who have fallen, and those who will fall." The same rule applies to system administrators. There are those who have lost a disk drive and those who will lose a disk drive. (I'm sure my dad was just trying to keep me from buying a motorcycle, but the logic still applies. That's not bad for a guy who got his first computer last year, don't you think?)
Whenever I speak about my favorite subject at conferences, I always ask questions like, "Who has ever lost a disk drive?" or "Who has lost an entire system?" Actually, this chapter was written while at a conference. When I asked those questions there, someone raised his hand and said, "My computer room just got struck by lightning." That sure made for an interesting discussion! If you haven't lost a system, look around you . . . one of your friends has.
Speaking of old adages, the one that says "It'll never happen to me" applies here as well. Ask anyone who's been mugged if they thought it would happen to them. Ask anyone who's been in a car accident if they ever thought it would happen to them. Ask the guy whose computer room was struck by lightning if he thought it would ever happen to him. The answer is always "No."
While the title of this book is Unix Backup & Recovery, the whole reason you are making these backups is so that you will be able to recover from some level of disaster. Whether it's a user who has accidentally or maliciously damaged something or a tornado that has taken out your entire server room, the only way you are going to recover is by having a good, complete, disaster recovery plan that is based on a solid backup and recovery system.
Neither can exist completely without the other. If you have a great backup system but aren't storing your media off-site, you'll be sorry when that tornado hits. You may have the most well organized, well protected set of backup volumes, but they won't be of any help if your backup and recovery system hasn't properly stored the data on those volumes. Getting good backups may be an early step in your disaster recovery plan, but the rest of that plan—organizing and protecting those backups against a disaster—should follow soon after. Although the task may seem daunting, it's not impossible.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Developing a Disaster Recovery Plan
Devising a good disaster recovery plan is hard work. You need to build it from the ground up, and it can take months or even years to perfect. Since computer environments are changing constantly, you continually have to test your plan to make sure it still works with your changing environment.
This chapter is not meant to be a comprehensive guide to disaster recovery planning. There are books dedicated to just that topic, and before you attempt to design your own disaster recovery plan, I strongly advise you to research this topic further. This chapter gives an overview of the steps necessary to complete such a plan, as well as discusses a few details that are typically left out of other books. It provides a frame of reference upon which the rest of the book will be based.
There are essentially six steps to designing a complete disaster recovery plan. While you may work on several steps simultaneously, the order listed here is very important. Don't jump into the design stage before understanding what level of risk your company is willing to take or what types of disasters the plan needs to address. Likewise, what good does it do to have a well-documented, well-organized disaster recovery plan based on a backup system that doesn't work? The six steps are as follows:
  1. Define (un)acceptable loss.
    Before you develop a disaster recovery plan, decide how much you will lose if you don't. That will help you decide how much time, effort, and money to spend on a disaster/recovery plan.
  2. Back up everything.
    You have to make sure that everything is backed up—including data, metadata, and the instructions you'll need to get them back.
  3. Organize everything.
    You have everything on backup volumes. But can you find the volume you need when disaster strikes? The key to being able to find your backups is organization.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Step 1: Define (Un)acceptable Loss
A disaster recovery plan is an insurance policy. If you've ever read anything about backups, you've heard that before. I would like to extend that analogy. Consider your car insurance policy. All insurance policies in the United States start with PIP, or personal injury protection. That way if you hit someone and get sued, you are protected. You can then add coverage for collision, personal property, emergency roadside assistance, and rental car coverage. These additional layers of coverage are called riders. Just like your car insurance policy, disaster recovery plans may include optional riders. You simply need to decide the types of riders that your company needs, or can afford. How do you do this? You have to look at the potential losses that your company will suffer if a disaster occurs and decide which ones are acceptable or unacceptable, as the case may be. You then select the riders that will protect you against the losses that you have decided are unacceptable. (This analogy is discussed in further detail in Chapter 2.)
You need to make the same kind of decisions on behalf of your company. If it is unacceptable to lose a single day's worth of data when a disaster happens, then you need to send your volumes to an off-site storage vendor every single day. You must decide what kind of losses your company is not willing to accept, and then insure against those losses with your disaster recovery plan. You cannot design a disaster recovery plan without this step. Every decision that you must make will be based on the information you discover during this analysis. Doing otherwise might cause you to purchase riders that you don't need or to leave out ones that you do need.
What is considered an acceptable loss for office automation data may not be considered acceptable when considering your customer database. Some data is easily re-created with effort, while other data is irreplaceable. Look at each type of data that you have and decide whether it can be re-created.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Step 2: Back Up Everything
This sounds like a given, right? It's not. Certain types of data typically are excluded or forgotten. Many companies cut corners by omitting certain types of data from their backups. For example, by excluding the operating system from your backups, you may save a little media. However, if you find yourself in need of the old /etc/fstab, you will be out of luck. You may save some money, but you also may be putting your company at risk. It's easier and safer just to back up everything. There also may be types of data that are forgotten completely. The most common mistake is to back up the data on a system but not to get a "picture" of what the system itself looks like in case you have to rebuild it.
It is best to have a system that automatically backs up everything, except for a few explicit exceptions specified on an exclude list. If your backup system requires you to update an include list every time a new filesystem is added, you may forget or you may add it incorrectly; the result is that the filesystem does not get backed up. In a disaster, this means the data never comes back. This is why I prefer backup products that automatically back up all filesystems. (The concept of include and exclude lists is covered in Chapter 2.)
Backing up a database requires more work than backing up a normal filesystem. (Actual database backup procedures are covered in Part V of this book.) Theoretically, if you are backing up everything in your filesystems and you are backing up your databases in some manner, you should be able to recover from disaster. Unfortunately, there are scenarios in which you might leave out an essential piece of the disaster recovery puzzle. The only way to ensure that you are prepared to recover your databases in case of a disaster is to back them up to another machine.
In fact, a previous version of my Oracle backup script (see Chapter 15) did not back up the online redologs during a hot backup. All my backup and recovery tests worked fine, until I attempted to restore the database to a different system. We were able to restore all the database files, but the database needed the redologs in order to complete the recovery. Since we had not backed up the redologs, we did not have them to restore. You see, when I was recovering the database to the same system, the redologs were always there. (Of course, I immediately changed the script to address this problem.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Step 3: Organize Everything
Good organization is really the key to a good disaster recovery plan. If you have hundreds or thousands of backup volumes but can't find them if you need them, what good are they? There is also the physical layout of the servers themselves. If they are all laid out in a standard way, recovering from a disaster is a whole lot simpler than if each server has its own unique layout.
Standardizing the layout of your servers is one of the more difficult things to do, since server configurations and OS configurations change over time. Look at the following list for some of the ways you can standardize, and standardize where you can. Experience has shown that it is worth the trouble to go back and restandardize. That is, it is worth the trouble to reimplement your new standard on your old servers.
The root disk
This should be your standard everywhere. Keep your OS on one disk if possible. Recovering an OS that is spread out on multiple disks is very difficult. Also, keep the partitioning (or LVM partitioning) of all of your OS disks consistent. You don't want to have to remember, "Oh yeah, this is the one with 1 MB of swap . . ."
Same-size disks
Partition all of your same-size disks exactly the same way, if possible. Consistency makes swapping them in and out very easy and gives you a lot of flexibility.
Same-function disks
If you have disks that serve the same purpose, partition them in the same way.
Database data disk
Decide on the best way to partition your database data disks, and partition all of them in the same way. For example, you might decide to fit as many 2 GB partitions as you can onto the disk. Anything left over can be used for those small databases that are always lurking around.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Step 4: Protect Against Disasters
What types of disasters strike your area? I grew up in an area in which an entire city block dropped into a sinkhole. Shortly after that, we were hit by hurricane David. Floods, tornadoes, and earthquakes hit other parts of the world. Your disaster recovery setup should be designed to protect against the types of disasters that affect your area.
You need to get a copy of the Disaster Recovery Yellow Pages. This is one of the most useful references that I have seen. These folks have combed the yellow pages of hundreds of cities and found literally thousands of companies that can help you with every phase of disaster recovery planning. They have everything from A to Z, including every kind of company that you could possibly need to recover from a disaster. There are emergency communication services, fire damage reclamation services, emergency medical services, emergency equipment suppliers, and anything else you can imagine. Some of these companies even have computer rooms on trucks that are able to roll out at a moment's notice. The Disaster Recovery Yellow Pages publishers have been told by a number of customers that a mere scan of their table of contents has made them rethink their disaster recovery plan. Get yourself a copy for your computer room and one for your vault. Send email to dryp@datablast.com for a complete table of contents.
Everyone knows that the best place to store your media is not in your computer room, next to the computer being backed up. Yet, that is the most common place where media is stored. You need to do something to protect the media that backs up your computers, or that media will be useless when disaster strikes.

Section 1.6.1.1: On-site vault systems

There are a number of fire-ready media vaults that you can use to protect your media against fire. This is the best protection for media that is to be stored on-site. Be forewarned, though, they are expensive. Contact Wrightline, Inc., for more information (
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Step 5: Document What You Have Done
While you are working your way through these steps, and certainly once your disaster recovery plan is complete, get it all down in writing. Document every procedure that you can. This is necessary to recover from a disaster—and to recover from the loss of an essential person. (You never know when someone might win the lottery.)
Again, there are a number of documentation formats. Choose the one that makes the most sense to you.
HTML
This is the documentation of choice for disaster recovery documentation. It is readable on any platform with a browser and therefore extremely portable. You don't even have to edit raw HTML anymore, since you can save as HTML with any modern word processor. This makes doing documentation in HTML much easier. Just make sure that you do the code in such a way that it can be read if the hostname changes. For example, make relative references to the current server rather than hard links to a particular URL. The one downside to using HTML is that it can take up more space than the other options discussed here.
PDF
The two positive things about the Adobe PDF format are its size and its truly platform-independent nature. However, it is not editable in its native format, and not everyone has a PDF reader installed. Still, the PDF format may be a good choice for you, as long as you are aware of its limitations.
Word processor
The word processor format is probably the easiest to manage of all these options. The only difficult part is getting a reader. However, if you choose the Microsoft Word format, any Windows laptop can read it with Wordpad. The only issue with this format is portability, although there are applications that can read Word files on Unix. Since you would have to obtain such an application prior to a disaster, though, I would suggest a more portable format.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Step 6: Test, Test, Test
The key to successfully recovering from a real disaster is to test your disaster recovery plan. The point of testing is to find things that need updating—and you will always find them. If you find a bad link in your disaster recovery plan, then fix it. Do not consider this test a failure. In fact, perhaps you should consider a test that doesn't find something wrong a failure.
Have a stranger test procedures
Don't have the person who wrote the procedure test the procedure. Have someone who is competent, but unfamiliar with your systems, do the test. Perhaps you can hire a consultant to test your procedures; they should be written so that such a person should be able to follow them. Not only is it a great way to find loopholes in your procedures, it is a great way to test what would happen if you lost some essential personnel.
Dream up disasters
This is the fun part. Ask the most pessimistic person you know to dream up disasters for you. See if he can come up with one that you haven't planned for.
Full-test every six months
This is what the contracts of many disaster recovery companies require. Such a test should take a day or so and is well worth your time. One of the problems with this is the availability of personnel. Again, hiring consultants is a good way to get this test done. Just don't use all consultants and no company personnel, because then nobody in-house will learn much from the test.
D/R companies will require a test
This is a great way to force you to do a test. If you have a contract with a disaster recovery company, they will require you to test your plan. If you don't test your plan, you are in breach of contract and the D/R company cannot be held responsible. There's something about paying money to a company for nothing that forces you to do what they want you to do—test!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Put It All Together
This chapter merely scratches the surface of disaster recovery planning. There are other books on the subject; look for books in print that have "disaster recovery" in their titles. Remember that prior proper planning prevents pitifully poor performance during a disaster that destroys, demolishes, and devastates your company. The chapters that follow describe in detail one element of a disaster recovery plan—the backup and recovery of your data.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Backing It All Up
In Chapter 1, we looked at disaster recovery as a whole. The nuts and bolts of backup and recovery are but a small part of the overall disaster recovery picture. Before we begin looking at the details of how to perform certain types of backups, let's look at backups in general.
The casual reader might assume that this chapter is an introduction to basic backup concepts. While that is, in fact, the purpose of this chapter, it is also true that many seasoned administrators are unfamiliar with the ideas presented here. One reason for this is that administrators find themselves constantly being pulled away from "mundane" activities like backups for things that are thought to be more "important"—like installing new servers and figuring out why the systems are running slowly. Also, many administrators may go several years without ever needing a restore. (The need to use your backups on a regular basis would undoubtedly change your ideas about their importance.)
I wrote this book because backups (and recoveries) have been my primary area of emphasis for several years, and I would like to share the lessons I've learned from this focused activity. This chapter provides an overview of how your backups should work. It also explains many basic, yet extremely important, concepts upon which any good backup plan should be based and upon which any implementation discussed in this book will be based.
There are many stories in this book, like the one in the following sidebar. Each is a true story that really happened to someone I know. These are not urban legends or horror stories passed on from admin to admin. These are firsthand encounters with disaster. Why is that important? Each story makes a point, and it was not just made up to make that point. The things that I warn about in this book really happen. This can be a very tough job if you are not prepared, so read closely.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Don't Skip This Chapter!
The casual reader might assume that this chapter is an introduction to basic backup concepts. While that is, in fact, the purpose of this chapter, it is also true that many seasoned administrators are unfamiliar with the ideas presented here. One reason for this is that administrators find themselves constantly being pulled away from "mundane" activities like backups for things that are thought to be more "important"—like installing new servers and figuring out why the systems are running slowly. Also, many administrators may go several years without ever needing a restore. (The need to use your backups on a regular basis would undoubtedly change your ideas about their importance.)
I wrote this book because backups (and recoveries) have been my primary area of emphasis for several years, and I would like to share the lessons I've learned from this focused activity. This chapter provides an overview of how your backups should work. It also explains many basic, yet extremely important, concepts upon which any good backup plan should be based and upon which any implementation discussed in this book will be based.
There are many stories in this book, like the one in the following sidebar. Each is a true story that really happened to someone I know. These are not urban legends or horror stories passed on from admin to admin. These are firsthand encounters with disaster. Why is that important? Each story makes a point, and it was not just made up to make that point. The things that I warn about in this book really happen. This can be a very tough job if you are not prepared, so read closely.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Why Should You Read This Book?
If you've been doing system administration for some time, you may be asking yourself this question. There are many answers. Perhaps self-preservation is your primary motivator. You'd like to make sure you don't lose your job the next time that a disk drive goes south. Perhaps you've already got a decent backup system, but you'd just like to make it better. Maybe you are looking for some new ideas on how to deal with upcoming backup and recovery needs. What follows are some of the reasons I think you should read it.
"We lost only a few days' worth of data." I swore the day I said that that I would never say those words again. From that day forward, I was convinced of the importance of backups. I never again assumed anything, and I began to study everything I could about backup technology. This book represents my attempt to compile what I have learned into a single volume, and it is written so that no one who reads it should ever need to utter the preceding statement. In my opinion, no amount of data loss is acceptable . I would also wager that you would be hard-pressed to find an end user who would feel much different. Whether it's a spreadsheet that one person created, or a customer database representing hours, or days of sales invoices and the efforts of hundreds of people—ask the person who needs the data how much data loss they think is acceptable. Every statement, every opinion, every story, and every chapter in this book are based on the premise that any data loss is unacceptable. Let me state that again for emphasis.
With the technology that is now available, there is no reason for any data to be lost—if backups are given the proper attention and priority that they need.
If you've been doing backups for a while, you know that this hasn't always been the case. Just a few years ago, if you couldn't do it with dump, tar, cpio, and your standard database backup utilities, you couldn't do it. The demand for midrange computers has grown astronomically in the last few years, and the need for bigger databases, larger filesystems, long filenames, and long pathnames grew proportionally. As things typically go in the backup world, large filesystems and huge databases were designed and shipped long before the utilities to back them up effectively were available. This created a large market for commercial backup utilities: one or two such products emerged, and scores of others eventually followed.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Serious Is Your Company About Backups?
I've heard it all. I've been accused of caring only about backups. It's been said that I think the whole world revolves around a cartridge reel. I've said that someday the world's going to crash, and I'm going to have the backup. The question is: how serious are you about protecting your data? To help you come to a decision in this matter, let's talk about what will happen if you don't have good backups.
To answer this question, you need to consider what kind of data you are backing up. This is a perfect time to include people who may not consider themselves computer people. Get input from other departments to answer this question. When all those 1s and 0s come together, just what kind of stuff are we talking about? Do you use manual accounting methods, or are your company's financial records stored in some accounting software somewhere? When a customer calls in and orders something, do you jot that down on a carbon-copied order form, or do you enter it in some sort of order processing program? What about things like budgets, memoranda, inventories, and any other "paperwork" that you throw around from day to day? Do you keep copies of every important memo that you send, or do you depend on the computer for that?
If you're like most people, you have grown quite dependent on these things we call computers. You forget how much of your work has been saved in the form of little magnetized bits spread out across a bunch of spinning platters. Maybe you work in an environment in which you've never lost a disk, so you've never had to do a restore. Maybe you've never fat-fingered a key and deleted an important file. If that's the case, then remember what my dad used to say. Motorcycle riders come in two types—those who have fallen and those who will fall. The same is true of disk drives. If the rabid dog of disaster hasn't bitten you, trust me, it's scratching at your door right now!
So what would you lose if you lost data? To quantify this, we need to examine the types of systems that may reside in your environment. Most of what you could lose is very tangible—and quantifiable in monetary terms—and might surprise you.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
You Can Find a Balance
Using a system that has no backups is like driving a car 100 miles an hour down a busy road the day after your insurance policy expires. Likewise, having a three-node, highly-available cluster for a noncritical application is like having full coverage on your 20-year-old, fifth car. Just as insurance plans have different levels of coverage and riders to cover various types of damage, different backup methodologies provide different levels of recoverability.
Not all environments need up-to-the-minute data recoverability. For many environments, recovering the systems up to last night's backups is acceptable. For some environments, recovering the system even up to last week or month is OK. Spending thousands of dollars and hundreds of hours implementing the greatest backup solution in the world is a waste—if you don't need that level of coverage. This usually is not the problem for most sites; on the contrary, most sites don't spend nearly enough money or effort on their backup and recovery system. In other cases, however, money sometimes is wasted on an unnecessarily elaborate system.
Recoverability requirements also vary from machine to machine within the same company. The amount of work that would be lost, or the possibility of adversely affecting a customer, may determine these requirements. For example, it may be considered acceptable for an employee or two to lose a day's work spent on a few word processing documents. That is, unless it was your Senior Vice President's secretary who was working on the departmental budget, in which case your mileage may vary. And, it would probably be totally unacceptable for you to lose even one hour's worth of entries into the company-wide sales database used by hundreds of people.
The point is that your backup requirements are determined by your recoverability requirements. The difficulty comes in finding (and using) a tool capable of providing you with the level of recoverability that you need. Consider users' home directories for a minute. If they are local to each user's workstation, a loss of one user's disk in the afternoon would mean that one user would lose a few hours of work. However, if user directories are located on an NFS file server that serves thousands of users, you could potentially lose several thousand hours of work if you use only traditional backup tools. If that loss would be considered unacceptable, then you need to examine the newest trend in backups—the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Deciding What to Back Up
Experience shows that one of the most common causes of data loss is that the lost data was never configured to be backed up. The decision of what to back up is an important one.
When trying to decide what files to include in your backups, take the most pessimistic technical person in your company out to lunch. In fact, get a few of them together. Ask them to come up with scenarios that you should protect against. Use these scenarios in deciding what should be included, and they will help you plan the "how" section as well. Ask your guests, "What are the absolute worst scenarios that could cause data loss?" Here are some possible answers:
  • An entire system catches fire and melts to the ground, leaving an unrecognizable mass of molten metal and blackened, smoking plastic.
  • Since this machine was so important, you, of course, had it replicated to another node right next to it. Of course, that machine catches fire right along with this one.
  • You have a centralized server that controls all backups and keeps a record of backup volume locations and what files are on what volumes, and so on. The server that blew up sits right next to this "backup server," and the intense heat took this system with it.
  • The disastrous chain reaction continues, taking out your DHCP server, NIS master server, NFS home directory server, NFS application server, and the database server where you house the inventory of all your backup volumes with their respective locations. This computer also holds the telephone database listing all service agreements, vendor telephone numbers, and escalation procedures.
  • You haven't memorized the number to your new off-site storage vendor yet, so it's taped to the wall next to your backup server. You realize, of course, that the flames just burnt that paper beyond recognition.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Deciding When to Back Up
This might appear to be the most straightforward topic. Everybody backs up their system every night, right? What's the big deal? Actually, this could more aptly be titled "What levels do I run when?" It's always a big question. How often do you run a full backup? How often do you run incremental backups? Do you run various levels of incrementals that back up just today's changes or continuous incremental backups that back up everything since the last full backup ? Everyone has his own answers to these questions. The only thing that is a definite is that there should be at least some level of incremental backup every night. Before any further discussion on the topic, let's define some terms.
The following are various backup levels:
Level 0
A full backup.
Level 1
An incremental backup that backs up everything that has changed since the last level backup.
Levels 2-9
Each level backs up whatever has changed since the last backup of the next lowest level, e.g., a level 2 backs up everything that changed since a level 1, or since a level 0, if there is no level 1.
Incremental
Usually, a backup that behaves like levels 1-9. Also used by some products to mean the same as a level 1, backing up all changes since a level 0.
Differential
A type of "incremental" backup, in the generic sense, which backs up only what has changed since the last differential. This term is usually found in software that does not use the numbered level concept. Such software would use the terms "full," "incremental," and "differential." In this type of setup, you would use a full backup to get the entire system, an occasional incremental to get all changes since that full backup, and differentials each night to catch only the changes since the previous day. In some backup products, repeated level 9s will act like differential backups.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Deciding How to Back Up
Once you've decided when you're going to back up, you have to decide how you are going to back up the data. But first, look at what types of problems you are protecting yourself from.
As stated earlier, how you want to do your restores determines how you want to do your backups. One of the questions that you must ask yourself is, "What are you going to protect yourself from?" Are the users in your environment all "power users" who use their computers intelligently and never make dumb mistakes? Would your company lose a lot of essential data if the files on your users' PCs are accidentally deleted? If a hurricane takes out your whole company, would it be able to continue doing business? Make sure that you are aware of all the potential causes for data loss, then make sure your backup methods are prepared for all of them from which you want to protect yourself. The most exhaustive list of potential causes of data loss that I have seen was in another O'Reilly book called Practical Unix and Internet Security, by Simson Garfinkel and Gene Spafford. Their list, with my comments attached, follows:
User error
This has been, by far, the biggest percentage of restores in every environment that I have seen. "Hey, I was sklocking my flambality file, and I accidentally pressed the jankle button. Can you restore it, please ?" This one is pretty easy, right? What about the common question: "Can you restore it as of about an hour ago?" There is one backup method that can handle this. There are systems that come with snapshot technology built in. There is at least one software product that can give you this capability on a standard Unix box. Snapshots, which are discussed in Chapter 19, give you the ability to do what users already think you're doing—backing the servers up all day, as often as you want. You can do this with almost no CPU overhead if you have the right software solution.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Storing Your Backups
It doesn't do any good to make really good backups, only to have your backup volumes destroyed, lost, or misplaced. You need to have a well-defined process for storing your media.
If you've read this far, you know that I consider your backups to be very important. If your backups are important, then isn't the media on which they reside just as important? That goes without saying, right? Well, you'd never know it from most volume "libraries." Volume "piles" is probably a more accurate term. How many computer rooms have you seen that have volumes spread out all over the place? They get stacked, piled, fall behind the systems, and a DLT cartridge works really well as a coaster for a coffee mug. (We wouldn't want to get any coffee rings on the new server, right?)
Have you ever really needed a volume and couldn't find it? I've been there. It's a horrible feeling to know that you've got the file on a volume, but can't find the darn volume! Why, then, do we treat our backup volumes like so much dirty laundry? Organize your backup volumes! Label them, catalog them, give them unique names or numbers, and put them in some sort of logical order in some sort of storage container. Do it, or the backup demon will come to haunt you!
Your ability to perform a large recovery quickly is directly related to how well organized your media is.
What about that media cabinet that you're using for your on-site volume storage? You don't have one, you say? You're using a file cabinet, you say? Well, use something, but if you can afford it, there are a number of companies that make storage containers for media. They also make cabinets that can withstand fire. Spend the money—you'll be glad you did. Doing a restore is so much less stressful when you can find the volume with no problem. Remember, though, that fireproof does not mean heat-proof. These types of media safes are meant to withstand brief fires that are quickly extinguished by a sprinkler system. If a fire burns for long right next to the container or raises the temperature in the room significantly, the volumes may be no good anyway. (This is another good reason why you also must store volumes off-site.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Testing Your Backups
I wish there were enough to say about this to make it a separate chapter, because it's that important. I can't tell you how many stories I have heard about people who wait until they need a major restore before they test their backups. That's when they find out that they've been using the wrong device or the wrong blocking factor or that the device had I/O errors. This point cannot be stated strongly enough. If you don't test your backups, then you are guaranteed to get a surprise sooner or later.
It is important to test every type of restore. If you are testing filesystem backups, make sure you:
  • Restore many single files. Can you find the needle in the haystack?
  • Restore an older version of a file.
  • Restore an entire filesystem, and compare your results with the original. Are they the same size, and so on?
  • Pretend that an entire system is down, and try to re-create it.
  • Pretend that a particular volume is bad, and force yourself to use an alternate backup.
  • Retrieve a few volumes from your off-site storage vendor.
  • Pretend that your backup server is destroyed, and try to recover from that. (This one's tough!) This test is extremely important if you are using a commercial backup utility. Some products do not plan for this well, and you can find yourself in a real catch-22 situation.
If you are testing database restores, make sure you:
  • Restore part of your database, pretending that you lost only one data file or disk drive, if this option is available.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Monitoring Your Backups
If you are not monitoring your backups, they are not doing what you think they are doing—guaranteed. This is one pot that will not boil if you don't watch it. Every backup should have a log that is examined daily. This can be automated as well. For example, here's how I automate the monitoring of dump backup logs:
Give me a summary
dump gives a whole bunch of messages that I couldn't care less about, Pass I, Pass II, % done, etc. When I'm monitoring the dump backups of hundreds of filesystems, most of that is so much noise. What I really want to see is what got dumped, where it went, when it went, what level it was, and the ever-popular DUMP IS DONE message. To get a summary of just these lines, the first thing I do is use grep -v to exclude the phrases I don't want, leaving only a few lines. This is much easier to review.
Show me anything weird
You can do this in either of two ways. If you know the phrases that show up when things go wrong, then grep for those. Another way is to use grep -v to remove all lines you're expecting, and see what's left. If there's nothing, then great! If there are lines left over, they are probably errors. You may see lines like I/O Error, Write error, or something else you don't like to see in your backups.
I don't care how good your backups are; they can always be better. You could spend every waking hour tweaking and improving every piece of your backup program, know everything there is to know about backups, and they could still be better. My backups will never be good enough. There's always a new bell or whistle on some other backup package, a bigger or smarter jukebox, a faster backup drive, or some scenario I thought of that I'm not covering. You must realize, however, that every change you make is a potential for data loss. A common thread that you will find in this book is that every time the human being enters into the equation, things can go wrong. You may be the best shell or Perl hacker in the world, and you will still make mistakes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Following Proper Development Procedures
Don't make a new change on your backup script and then roll it out to all your machines at once. Test it on a development system, or better yet, on a system that you don't normally back up. That way you aren't putting any backups in jeopardy. Another good practice is to test the change in parallel with what you're already doing. The bigger the change, the more important it is to do a parallel conversion. This is especially true if you're using a new method, rather than just enhancing your current one. Don't stop using your old method until you're sure that the new one works! Follow a plan similar to this:
  • Test the syntax of your new script somewhere where it really won't hurt anybody if it does something like, oh, crash the system!
  • Test the operation on a small scale on one system, using it in the same manner as you would in production. For example, if you are going to do both remote and local backups with this program, test both on a small scale.
  • Try to simulate every potential error the program might encounter:
    • Eject a volume in the middle of the backup.
    • Write-protect a volume.
    • Reboot the system you are backing up while it is backing up.
    • Drop the network connection and power down a disk drive.
    • Know the program and the errors for which it is testing, and simulate each one to test that section of your program.
  • Test on a small number of systems, preferably in parallel with your current method.
  • When you roll it out to all systems, definitely do so in parallel. One of the ways you can do this is to squeeze all your backups onto as few volumes as you can, then use the leftover drives to do the new backup in parallel. Your network guys might hate you, but it's really the only way to do a true parallel conversion. When I converted to my first commercial backup utility, I ran in this mode for almost a year.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Unrelated Miscellanea
We were going to call this section "Oh, and by the way," but that seemed like a really weird heading.
One of the reasons that backups are unpopular is that people are worried that they might get fired if they do them wrong. People do get in trouble when restores don't go right, but following the suggestions in this section will help you to protect yourself from "recovery failure fallout."

Section 2.12.1.1: Self-preservation: document, document, document

Have you ever tried to go on vacation? If you're the only one who understands the restore process or the organization of your media, you can bet that you will be called if a big restore is required. Backups are one area of system administration in which inadequate documentation can really get you in trouble. It's hard to go on vacation, get promoted, or do anything that would pull you away from the magical area that only you know. Your backups and restores should be documented to the point that any system administrator can follow them step-by-step in your absence. That is actually a good way to test your documentation—have someone else try to use it.
The opposite of good documentation is, of course, bad, or nonexistent, documentation. Bad documentation is the surest way to help you find a new job. If you do ever manage to take a real vacation in which you don't carry a beeper, check you voice mail, or check your email, watch out. Murphy's law governs vacations as well. You can guarantee yourself that you, or more accurately, your coworkers, will have a major outage that week. If they crash and burn because you left them no clue as to how to perform a restore, they will be looking for you when you return. You will not be a popular person, and you just might find yourself combing through the want ads.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Good Luck
The chapters that follow explore in depth the various methods that you may employ to back up your systems. Most of these topics also are covered in documentation from the appropriate vendor; this book is not meant to be a replacement for that documentation. I try to explain things that are not covered in the documentation and possibly address some subjects more frankly than a manual provided by the vendor can.
Welcome to the world of backups.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Native Backup & Recovery Utilities
Native utilities are the backup utilities that you find in a standard Unix distribution. I'll admit that these utilities are rather boring. They do nothing fancy and they have many limitations, some of which have been there since they were originally written to back up a PDP-11 to a 9-track tape. (In sixth and seventh edition Unix, it was still called restor —a throwback to the Multics days.) Some of these utilities have bugs that persist to this very day. (They've finally fixed the "tape-rewinding" bug in dump, but only on some Unix versions.)
Yet these native backup utilities do have a few features that have not been duplicated by commercial backup vendors. These features will always be there, and they don't cost extra. They also work basically the same everywhere, with only a few minor differences. Whether you're just starting out in the backup world or you're an experienced systems administrator, you need to be familiar with these utilities.
This chapter describes the benefits and pitfalls of several utilities. dump and restore are usually the best option if they are available. After dump and restore, cpio has the best functionality, but it is slightly less user friendly than its cousin tar. tar is incredibly easy to use and is much more portable than either dump or cpio. If you have to back up raw devices or perform remote backups with tar or cpio, dd will be your new best friend.
This chapter begins with an overview of each of these backup utilities. It then goes into detail about the syntax for each command for both backup and recovery. Finally, near the end of the chapter, there is an invaluable comparison chart that can be used as a quick reference guide for comparing dump, tar, and cpio.
If you are responsible for backing up at least one Unix server, can't afford a commercial backup product, and don't want to trust your mission-critical backups to a public domain utility, then hopefully your version of Unix supports the commands
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
An Overview
This chapter describes the benefits and pitfalls of several utilities. dump and restore are usually the best option if they are available. After dump and restore, cpio has the best functionality, but it is slightly less user friendly than its cousin tar. tar is incredibly easy to use and is much more portable than either dump or cpio. If you have to back up raw devices or perform remote backups with tar or cpio, dd will be your new best friend.
This chapter begins with an overview of each of these backup utilities. It then goes into detail about the syntax for each command for both backup and recovery. Finally, near the end of the chapter, there is an invaluable comparison chart that can be used as a quick reference guide for comparing dump, tar, and cpio.
If you are responsible for backing up at least one Unix server, can't afford a commercial backup product, and don't want to trust your mission-critical backups to a public domain utility, then hopefully your version of Unix supports the commands dump and restore. You can't beat their flexibility and versatility for backing up and restoring an entire system. dump and restore are relatively sophisticated commands, with simple interfaces whose essential options are the same on most Unix systems. Some versions of Unix have changed the name and a few of the features of dump, but most of the changes are minor. dump can even be found on Unix-like systems such as Linux and Network Appliance boxes. Even if you don't plan on using dump for backups in the future, chances are you've got several dump volumes in a cabinet somewhere that you may need to read someday. When you do need to read those volumes, hopefully you will have this book handy.
If you do not have dump or you can't use the version you have, then cpio is your next best choice. cpio has been around longer than any other backup utility and has some very important features that other commands do not have. First, there are a few things that
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Backing Up with the dump Utility
For many environments, dump may be all you need to ensure good-quality backups. To use dump and restore for regular system backups, you need to understand the following:
  • How to use dump to back up a filesystem (with the appropriate options)
  • How the backup ends up on the volume
  • How to get the table of contents of a dump volume
  • How to manipulate the volume and restore from a backup created by dump
  • The limitations of dump and restore
  • What you should be doing if you are using dump on a regular basis
The first thing to understand is what your dump command is and what its options are. See Table 3-1 for a listing of dump commands on various Unix versions. What follows is essentially a unified manpage for these dump -like commands on specific operating systems.
Table 3-1: dump-Like Commands on Different Unix Versions
HP-UX 9.x
HP-UX 10
SunOS
IRIX
Solaris
SCO
Network Appli-ance
AIX
Linux
SGI
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Restoring with the restore Utility
Content preview·