Youâve been creating tables without giving much thought to them. And thatâs fine, they work. You can SELECT
, INSERT
, DELETE
, and UPDATE
with them. But as you get more data, you start seeing things you wish youâd done to make your WHERE
clauses simpler. What you need is to make your tables more normal.
Jack and Mark both created tables to store information about record-setting fish. Markâs table has columns for the species and common names of the fish, its weight, and where it was caught. It doesnât include the names of the people who caught the fish.
Jackâs table has the common name and weight of the fish, but it also contains the first and last names of the people who caught them, and it breaks down the location into a column containing the name of the body of water where the fish was caught, and a separate state column.
SQL is known as a Relational Database Management System, or RDBMS. Donât bother memorizing it. We only care about the word RELATIONAL[1]. All this means to you is that to design a killer table, you need to consider how the columns relate to each other to describe a thing.
The challenge is to describe the thing using columns in a way that makes getting the information out of it easy. This depends on what you need from the table, but there are some very broad steps you can follow when youâre creating a table.
Pick your thing, the one thing you want your table to describe.
Make a list of the information you need to know about your one thing when youâre using the table.
Using the list, break down the information about your thing into pieces you can use for organizing your table.
We could, but we donât need the data broken down to that level.
At least, not in this case. If Jack had been writing an article about the best places to go on vacation and catch a big fish, then he might have wanted the street number and name so readers could find accommodations nearby.
But Jack only needed location and state, so he only added as many columns as he needed to save space in his database. At that point, he decided his data was broken down enoughâit is atomic.
Whatâs an atom? A little piece of information that canât or shouldnât be divided. Itâs the same for your data. When itâs ATOMIC, that means that itâs been broken down into the smallest pieces of data that canât or shouldnât be divided.
Consider a pizza delivery guy. To get to where heâs going, he just needs a street number and address in a single column. For his purposes, thatâs atomic. He never needs to look for a single street number on its own.
In fact, if his data were broken into street number and street name, his queries would have to be longer and more complicated, making it take him longer to get the pizza to your front door.
When your data consultancy takes off and you need to hire more SQL database designers, wouldnât it be great if you didnât need to waste hours explaining how your tables work?
Well, making your tables NORMAL means they follow some standard rules your new designers will understand. And the good news is, our tables with atomic data are halfway there.
Making your data atomic is the first step in creating a NORMAL table.
Normal tables wonât have duplicate data, which will reduce the size of your database.
With less data to search through, your queries will be faster.
Because, even when your tables are tiny, it adds up. And tables grow. If you begin with a normalized table, you wonât have to go back and change your table when your queries go too slowly.
Remember the clown table? Clown tracking has become a nationwide craze, and our old table isnât going to cut it because the appearance
and activities
columns contain so much data. For our purposes, this table is not atomic.
clown_info
name | last_seen | appearance | activities |
---|---|---|---|
Elsie | Cherry Hill Senior Center | F, red hair, green dress, huge feet | balloons, little car |
Pickles | Jack Greenâs party | M, orange hair, blue suit, huge feet | mime |
Snuggles | Ball-Mart | F, yellow shirt, baggy blue pants | horn, umbrella |
Mr. Hobo | Eric Grayâs Party | M, cigar, black hair, tiny hat | violin |
Clarabelle | Belmont Senior Center | F, pink hair, huge flower, blue dress | yelling, dancing |
Scooter | Oakland Hospital | M, blue hair, red suit, huge nose | balloons |
Zippo | Millstone Mall | F, orange suit, baggy pants | singing |
Babe | Earlâs Autos | F, all pink and sparkly | balancing, little car |
Bonzo | Dickson Park | M, in drag, polka dotted dress | singing, dancing |
Sniffles | Tracyâs | M, green and purple suit, pointy nose | climbing into tiny car |
Remember, our table is only about halfway normal when itâs got atomic data in it. When weâre completely normal weâll be in the FIRST NORMAL FORM or 1NF.
To be 1NF, a table must follow these two rules:
Each row of data must contain atomic values.
Each row of data must have a unique identifier, known as a Primary Key.
The column in your table that will be your primary key has to be designated as such when you create the table. In a few pages, weâll create a table and designate a primary key, but before that, letâs take a closer look at what a primary key is.
A primary key is a column in your table that makes each record unique.
The primary key is used to uniquely identify each record
Which means that the data in the primary key column canât be repeated. Consider a table with the columns shown below. Do you think any of those would make good primary keys?
Watch it!
Take care using SSNs as the Primary Keys for your records.
With identity theft only increasing, people donât want to give out SSNsâand with good reason. Theyâre too important to risk. Can you absolutely guarantee that your database is secure? If itâs not, all those SSNs can be stolen, along with your customersâ identities.
If itâs null, it canât be unique because other records can also be NULL.
The primary key must be given a value when the record is inserted
When you insert a record without a primary key, you run the risk of ending up with a NULL primary key and duplicate rows in your table, which violates First Normal Form.
The primary key must be compact
A primary key should contain only the information it needs to to be unique and nothing extra.
The primary key values canât be changed
If you could change the value of your key, youâd risk accidentally setting it to a value you already used. Remember, it has to remain unique.
Brain Power
Given all these rules, can you think of a good primary key to use in a table?
Look back through the tables in the book. Do any of them have a column that contains truly unique values?
The best primary key may be a new primary key.
When it comes to creating primary keys, your best bet may be to create a column that contains a unique number. Think of a table with peopleâs info, but with an additional column containing a number. In the example below, letâs call it ID
.
If it werenât for the ID
column, the records for John Brown would be identical. But in this case, theyâre actually two different people. The ID
column makes these records unique. This table is in first normal form.
Geek Bits
Thereâs a big debate in the SQL world about using synthetic, or made-up, primary keys (like the ID column above) versus using natural keysâdata that is already in the table (like a VIN number on a car or SSN number). We wonât take sides, but we will discuss primary keys in more detail in Chapter 7.
Itâs time to step back and normalize our tables. We need to make our data atomic and add primary keys. Creating a primary key is normally something we do when we write our CREATE TABLE
code.
From what youâve seen so far, this is how youâd have to fix Gregâs table:
Fixing Gregâs table Step 1: SELECT all of your data and save it somehow.
Fixing Gregâs table Step 2: Create a new normal table.
Fixing Gregâs table Step 3: INSERT all that old data into the new table, changing each row to match the new table structure.
So now you can drop your old table.
So, we know that Gregâs table isnât perfect.
Itâs not atomic and it has no primary key. But luckily for Greg, you donât have to live with the old table, and you donât have to dump your data.
We can add a primary key to Gregâs table and make the columns more atomic using just one new command. But first, letâs take a little trip to the past...
Greg needs a primary key, and after all the talk about atomic data, he realizes there are a few things he could do to make his columns more atomic. Before we look at how to fix the existing table, letâs look at how we could have created the table in the first place!
Hereâs the table we created way back in Chapter 1.
What if you use the DESCRIBE my_contacts
command to look at the code you used when you set up the table? Youâll see something that looks a lot like this:
But we really want to look at the CREATE
code here, not the fields in the table, so we can figure out what we should have done at the very beginning without having to write the CREATE
statement over again.
The statement SHOW CREATE_TABLE
will return a CREATE TABLE
statement that can exactly recreate our table, minus any data in it. This way, you can always see how the table you are looking at could be created. Try it:
SHOW CREATE TABLE my_contacts;
Take a look at the code we used to create the table in The CREATE TABLE we wrote, and the code below that the SHOW CREATE TABLE my_contacts
gives you. They arenât identical, but if you paste the code below into a CREATE TABLE
command, the end result will be the same. You donât need to remove the backticks or data settings, but itâs neater if you do.
Although you could make the code neater (by removing the last line and backticks), you can just copy and paste it to create a table.
Hereâs the code our SHOW CREATE TABLE my_contacts
gave us. We removed the backticks and last line. At the top of the column list we added a contact_id
column that weâre setting to NOT NULL
, and at the bottom of the list, weâre add a line PRIMARY KEY
, which we set to use our new contact_id
column as the primary key.
Adding the keyword AUTO_INCREMENT
to our contact_id
column makes our SQL software automatically fill that column with a value that starts on row 1 with a value of 1 and goes up in increments of 1.
What do you think will happen?
Better yet, try it out for yourself and see what happens.
You wonât have to start over; instead, you can use an ALTER statement.
A table with data in it doesnât have to be dumped, then dropped, then recreated. We can actually change an existing table. But to do that, weâre going to borrow the ALTER
statement and some of its keywords from Chapter 5.
Hereâs the code to add an AUTO_INCREMENT
primary key to Gregâs my_contacts
table. (Itâs a long command, so youâll need to turn your book.)
Try the code yourself. Open your SQL terminal. USE
the gregs_list
database, and type in this command:
To see what happened to your table, try a SELECT * from my_contacts;
Will Greg get his phone number column? Turn to Chapter 5 to find out.
Youâve got Chapter 4 under your belt. Look at all the new tools youâve added to your toolbox now! For a complete list of tooltips in the book, see Appendix C.
[1] Some people think that RELATIONAL means multiple tables relating to each other. Thatâs not correct.
Get Head First SQL now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.