ChapterÂ 4.Â Smart Table Design: Why be normal?

Youâve been creating tables without giving much thought to them. And thatâs fine, they work. You can SELECT, INSERT, DELETE, and UPDATE with them. But as you get more data, you start seeing things you wish youâd done to make your WHERE clauses simpler. What you need is to make your tables more normal.

Two fishy tables

Jack and Mark both created tables to store information about record-setting fish. Markâs table has columns for the species and common names of the fish, its weight, and where it was caught. It doesnât include the names of the people who caught the fish.

Jackâs table has the common name and weight of the fish, but it also contains the first and last names of the people who caught them, and it breaks down the location into a column containing the name of the body of water where the fish was caught, and a separate state column.

There are no Dumb Questions

Q:
Q: So Jackâs table is better than Markâs?
A:
A: No. Theyâre different tables with different purposes. Mark will rarely need to search directly for a state because he only really cares about the species and common names of the record-breaking fish and how much they weighed.
Jack, on the other hand, will need to search for states when heâs querying his data. Thatâs why his table has a separate column: to allow him to easily target states in his searches.
Q:
Q: Should we avoid LIKE when querying our tables? Is there something wrong with it?
A:
A: Thereâs nothing wrong with LIKE, but it can be difficult to use in your queries, and you risk getting results you donât want. If your columns contain complicated information, LIKE isnât specific enough to target precise data.
Q:
Q: Why are shorter queries better than longer ones?
A:
A: The simpler the query, the better. As your database grows, and as you add in new tables, your queries will get more complicated. If you start with the simplest possible query now, youâll appreciate it later.
Q:
Q: So are you saying I should always have tiny bits of data in my columns?
A:
A: Not necessarily. As youâre starting to see with Markâs and Jackâs tables, it depends on how youâll use the data.
For example, imagine a table listing cars for a mechanic and one for a car salesman. The mechanic might need precise information on each car, but the auto dealer might only need the carâs make, model, and VIN number.
Q:
Q: Suppose we had a street address. Why couldnât we have one column with the entire address, then other columns that break it apart?
A:
A: While duplicating your data might seem like a good idea to you now, consider how much room on your hard drive it will take up when your database grows to an enormous size. And each time you duplicate your data, thatâs one more clause in an UPDATE statement youâll have to remember to add when your data changes.
Letâs take a closer look at how to design your tables the best possible way for your use.
How youâre going to use your data will affect how you set up your table.

Brain Power

SQL is the language used by relational databases. What do you think ârelationalâ means in an SQL database?

A table is all about relationships

SQL is known as a Relational Database Management System, or RDBMS. Donât bother memorizing it. We only care about the word RELATIONAL^[1]. All this means to you is that to design a killer table, you need to consider how the columns relate to each other to describe a thing.

The challenge is to describe the thing using columns in a way that makes getting the information out of it easy. This depends on what you need from the table, but there are some very broad steps you can follow when youâre creating a table.

Pick your thing, the one thing you want your table to describe.
Note
Whatâs the main thing you want your table to be about?
Make a list of the information you need to know about your one thing when youâre using the table.
Note
How will you use this table?
Using the list, break down the information about your thing into pieces you can use for organizing your table.
Note
How can you most easily query this table?

We could, but we donât need the data broken down to that level.

At least, not in this case. If Jack had been writing an article about the best places to go on vacation and catch a big fish, then he might have wanted the street number and name so readers could find accommodations nearby.

But Jack only needed location and state, so he only added as many columns as he needed to save space in his database. At that point, he decided his data was broken down enoughâit is atomic.

Brain Power

What do you think the word atomic means in terms of SQL data?

Atomic data

Whatâs an atom? A little piece of information that canât or shouldnât be divided. Itâs the same for your data. When itâs ATOMIC, that means that itâs been broken down into the smallest pieces of data that canât or shouldnât be divided.

30 minutes or itâs free

Consider a pizza delivery guy. To get to where heâs going, he just needs a street number and address in a single column. For his purposes, thatâs atomic. He never needs to look for a single street number on its own.

In fact, if his data were broken into street number and street name, his queries would have to be longer and more complicated, making it take him longer to get the pizza to your front door.

Location, location, location

Now consider a realtor. He might want to have a separate column for the street number. He may want to query on a given street to see all the houses for sale by street number. For him, street number and street name are each atomic.

Atomic data and your tables

There are some questions you can ask to help you figure out what you need to put in your tables:

What is the one thing your table describes?
Does your table describe clowns, cows, doughnuts, people?
How will you use the table to get at the one thing?
Note
Design your table to be easy to query!
Do your columns contain atomic data to make your queries short and to the point?

There are no Dumb Questions

Q:
Q: Arenât atoms tiny, though? Shouldnât I be breaking my data down into really tiny pieces?
A:
A: No. Making your data atomic means breaking it down into the smallest pieces that you need to create an efficient table, not just the smallest possible pieces you can.
Donât break down your data any more than you have to. if you donât need extra columns, donât add them just for the sake of it.
Q:
Q: How does atomic data help me?
A:
A: It helps you ensure that the data in your table is accurate. For example, if you have a column for street numbers, you can make sure that only numbers end up in that column.
Atomic data also lets you perform queries more efficiently because the queries are easier to write and take a shorter amount of time to run, which adds up when you have a massive amount of data stored.

Exercise

Now that you know the official rules and the three steps to making data atomic, take a look at each table from earlier in this book and explain why it is or isnât atomic.

Gregâs table, NOT NULL appears in DESC _________________________________________________________

Donut rating table, Doughnut ask what your table can do for you... ___________________________________________________

Clown table, Clown tracking _________________________________________________________

Drink table, Exercise __________________________________________________________

Fish info, Two fishy tables ___________________________________________________________

Reasons to be normal

When your data consultancy takes off and you need to hire more SQL database designers, wouldnât it be great if you didnât need to waste hours explaining how your tables work?

Well, making your tables NORMAL means they follow some standard rules your new designers will understand. And the good news is, our tables with atomic data are halfway there.

Making your data atomic is the first step in creating a NORMAL table.

The benefits of normal tables

Normal tables wonât have duplicate data, which will reduce the size of your database.
With less data to search through, your queries will be faster.
Because, even when your tables are tiny, it adds up. And tables grow. If you begin with a normalized table, you wonât have to go back and change your table when your queries go too slowly.

Clowns arenât normal

Remember the clown table? Clown tracking has become a nationwide craze, and our old table isnât going to cut it because the appearance and activities columns contain so much data. For our purposes, this table is not atomic.

Note

These two columns are really difficult to query because they contain so much data!

clown_info

name	last_seen	appearance	activities
Elsie	Cherry Hill Senior Center	F, red hair, green dress, huge feet	balloons, little car
Pickles	Jack Greenâs party	M, orange hair, blue suit, huge feet	mime
Snuggles	Ball-Mart	F, yellow shirt, baggy blue pants	horn, umbrella
Mr. Hobo	Eric Grayâs Party	M, cigar, black hair, tiny hat	violin
Clarabelle	Belmont Senior Center	F, pink hair, huge flower, blue dress	yelling, dancing
Scooter	Oakland Hospital	M, blue hair, red suit, huge nose	balloons
Zippo	Millstone Mall	F, orange suit, baggy pants	singing
Babe	Earlâs Autos	F, all pink and sparkly	balancing, little car
Bonzo	Dickson Park	M, in drag, polka dotted dress	singing, dancing
Sniffles	Tracyâs	M, green and purple suit, pointy nose	climbing into tiny car

Halfway to 1NF

Remember, our table is only about halfway normal when itâs got atomic data in it. When weâre completely normal weâll be in the FIRST NORMAL FORM or 1NF.

To be 1NF, a table must follow these two rules:

Each row of data must contain atomic values.

Note

We already know how to do this.

Each row of data must have a unique identifier, known as a Primary Key.

Note

To make our tables completely normal, we need to give each record a Primary Key.

Brain Power

What types of columns do you think would make good Primary Keys?

PRIMARY KEY rules

The column in your table that will be your primary key has to be designated as such when you create the table. In a few pages, weâll create a table and designate a primary key, but before that, letâs take a closer look at what a primary key is.

A primary key is a column in your table that makes each record unique.

The primary key is used to uniquely identify each record

Which means that the data in the primary key column canât be repeated. Consider a table with the columns shown below. Do you think any of those would make good primary keys?

Watch it!

Take care using SSNs as the Primary Keys for your records.

With identity theft only increasing, people donât want to give out SSNsâand with good reason. Theyâre too important to risk. Can you absolutely guarantee that your database is secure? If itâs not, all those SSNs can be stolen, along with your customersâ identities.

A primary key canât be NULL

If itâs null, it canât be unique because other records can also be NULL.

The primary key must be given a value when the record is inserted

When you insert a record without a primary key, you run the risk of ending up with a NULL primary key and duplicate rows in your table, which violates First Normal Form.

The primary key must be compact

A primary key should contain only the information it needs to to be unique and nothing extra.

The primary key values canât be changed

If you could change the value of your key, youâd risk accidentally setting it to a value you already used. Remember, it has to remain unique.

Brain Power

Given all these rules, can you think of a good primary key to use in a table?

Look back through the tables in the book. Do any of them have a column that contains truly unique values?

The best primary key may be a new primary key.

When it comes to creating primary keys, your best bet may be to create a column that contains a unique number. Think of a table with peopleâs info, but with an additional column containing a number. In the example below, letâs call it ID.

If it werenât for the ID column, the records for John Brown would be identical. But in this case, theyâre actually two different people. The ID column makes these records unique. This table is in first normal form.

Geek Bits

Thereâs a big debate in the SQL world about using synthetic, or made-up, primary keys (like the ID column above) versus using natural keysâdata that is already in the table (like a VIN number on a car or SSN number). We wonât take sides, but we will discuss primary keys in more detail in ChapterÂ 7.

There are no Dumb Questions

Q:
Q: You said âfirstâ normal form. Does that mean thereâs a second normal form? Or a third?
A:
A: Yes, there are indeed second and third normal forms, each one adhering to increasingly rigid sets of rules. Weâll cover second and third normal form in ChapterÂ 7.
Q:
Q: So weâve changed our tables to have atomic values. Are any of them in 1NF yet?
A:
A: No. So far, not a single table weâve created has a primary key, a unique value.
Q:
Q: The comments column in the doughnut table really doesnât seem atomic to me. I mean, thereâs no reasonable way to query that column easily.
A:
A: Youâre absolutely correct. That field is not particularly atomic, but then our design of the table didnât require it to be. If we wanted to restrict the comments to a specific predetermined set of words, that field could be atomic. But then it wouldnât contain true, spontaneous comments.

Getting to NORMAL

Itâs time to step back and normalize our tables. We need to make our data atomic and add primary keys. Creating a primary key is normally something we do when we write our CREATE TABLE code.

Brain Power

Do you remember how to add columns to an existing table?

Fixing Gregâs table

From what youâve seen so far, this is how youâd have to fix Gregâs table:

Fixing Gregâs table Step 1: SELECT all of your data and save it somehow.
Fixing Gregâs table Step 2: Create a new normal table.
Fixing Gregâs table Step 3: INSERT all that old data into the new table, changing each row to match the new table structure.
So now you can drop your old table.

So, we know that Gregâs table isnât perfect.

Itâs not atomic and it has no primary key. But luckily for Greg, you donât have to live with the old table, and you donât have to dump your data.

We can add a primary key to Gregâs table and make the columns more atomic using just one new command. But first, letâs take a little trip to the past...

The CREATE TABLE we wrote

Greg needs a primary key, and after all the talk about atomic data, he realizes there are a few things he could do to make his columns more atomic. Before we look at how to fix the existing table, letâs look at how we could have created the table in the first place!

Hereâs the table we created way back in ChapterÂ 1.

Brain Power

But what if you donât have your old CREATE TABLE printed anywhere? Can you think of some way to get at the code?

Show me the

What if you use the DESCRIBE my_contacts command to look at the code you used when you set up the table? Youâll see something that looks a lot like this:

But we really want to look at the CREATE code here, not the fields in the table, so we can figure out what we should have done at the very beginning without having to write the CREATE statement over again.

The statement SHOW CREATE_TABLE will return a CREATE TABLE statement that can exactly recreate our table, minus any data in it. This way, you can always see how the table you are looking at could be created. Try it:

SHOW CREATE TABLE my_contacts;

Time-saving command

Take a look at the code we used to create the table in The CREATE TABLE we wrote, and the code below that the SHOW CREATE TABLE my_contacts gives you. They arenât identical, but if you paste the code below into a CREATE TABLE command, the end result will be the same. You donât need to remove the backticks or data settings, but itâs neater if you do.

Although you could make the code neater (by removing the last line and backticks), you can just copy and paste it to create a table.

Note

Unless youâve deleted the original table, youâll have to give this one a new name.

The CREATE TABLE with a PRIMARY KEY

Hereâs the code our SHOW CREATE TABLE my_contacts gave us. We removed the backticks and last line. At the top of the column list we added a contact_id column that weâre setting to NOT NULL, and at the bottom of the list, weâre add a line PRIMARY KEY, which we set to use our new contact_id column as the primary key.

There are no Dumb Questions

Q:
Q: So you say that the PRIMARY KEY canât be NULL. What else keeps it from being duplicated?
A:
A: Basically, you do. When you INSERT values into your table, youâll insert a value in the contact_id column thatâs new each time. For example, the first INSERT statement will set contact_id to 1, the next contact_id will be 2, etc.
Q:
Q: Thatâs quite a pain to have to assign a new value to that PRIMARY KEY column each time I insert a new record. Isnât there an easier way?
A:
A: There are two ways. One is using a column in your data that you know is unique as a primary key. Weâve mentioned that this is tricky (for example, the problem with using Social Security Numbers).
The easy way is to create an entirely new column just to hold a unique value, such as contact_id on the facing page. You can tell your SQL software to automatically fill in a number for you using keywords. Turn the page for details.
Q:
Q: Can I use SHOW for anything else besides the CREATE command?
A:
A: You can use SHOW to display individual columns in your table:
SHOW COLUMNS FROM tablename; This command will display all the columns in your table and their data type along with any other column-specific details.
SHOW CREATE DATABASE databasename; Just like the SHOW CREATE table, youâll get the command that would exactly recreate your database.
SHOW INDEX FROM tablename; This command will display any columns that are indexed and what type of index they have. So far, the only index weâve looked at are primary keys, but this command will become more useful as you learn more.
And thereâs one more command thatâs VERY useful:
SHOW WARNINGS;
If you get a message on your console that your SQL command has caused warnings, type this to see the actual warnings.
There are quite a few more, but those are the ones that are related to things weâve done so far.
Q:
Q: So whatâs up with that backtick character that shows up when I use a SHOW CREATE TABLE? Are you sure I donât need it?
A:
A: It exists because sometimes your RDBMS might not be able to tell a column name is a column name. If you use the backticks around your column names, you can actually (although itâs a very bad idea) use a reserved SQL keyword as a column name.
For example, suppose you wanted to name a column select for some bizarre reason. This column declaration wouldnât work:
select varchar(50)
But this declaration would work:
`select` varchar(50)
Q:
Q: Whatâs wrong with using keywords as column names, then?
A:
A: Youâre allowed to, but itâs a bad idea. Imagine how confusing your queries would become, and the annoyance of typing those backticks when you can get away with not using them. Besides, select isnât a very good column name; it tells you nothing about what data is in it.

1, 2, 3... auto incrementally

Adding the keyword AUTO_INCREMENT to our contact_id column makes our SQL software automatically fill that column with a value that starts on row 1 with a value of 1 and goes up in increments of 1.

What do you think will happen?

Better yet, try it out for yourself and see what happens.

Exercise

Write a CREATE TABLE statement below to store first and last names of people. Your table should have a primary key column with AUTO_INCREMENT and two other atomic columns.
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
Open your SQL terminal or GUI interface and run your CREATE TABLE statement.

Try out each of the INSERT statements below. Circle the ones that work.

INSERT INTO your_table (id, first_name, last_name)
VALUES (NULL, 'Marcia', 'Brady');

INSERT INTO your_table (id, first_name, last_name)
VALUES (1, 'Jan', 'Brady');

INSERT INTO your_table
VALUES ('', 'Bobby', 'Brady');

INSERT INTO your_table (first_name, last_name)
VALUES ('Cindy', 'Brady');

INSERT INTO your_table (id, first_name, last_name)
VALUES (99, 'Peter', 'Brady');

Did all the Bradys make it? Sketch your table and its contents after trying the INSERT statements
your_table
id
first_name
last_name
Â Â Â
Â Â Â
Â Â Â
Â Â Â

id	first_name	last_name
Â	Â	Â
Â	Â	Â
Â	Â	Â
Â	Â	Â

Exercise Solution

Write a CREATE TABLE statement below. Your table should have a primary key column with AUTO_INCREMENT and two other atomic columns.
```
CREATE TABLE your_table
(
id INT NOT NULL AUTO_INCREMENT,
first_name VARCHAR(20),
last_name VARCHAR(30),
PRIMARY KEY (id)
);
```
Open your SQL terminal or GUI interface and run your CREATE TABLE statement.
Try out each of the INSERT statements below. Circle the ones that work.
Did all the Bradys make it? Sketch your table and its contents after trying the INSERT statements.

You wonât have to start over; instead, you can use an ALTER statement.

A table with data in it doesnât have to be dumped, then dropped, then recreated. We can actually change an existing table. But to do that, weâre going to borrow the ALTER statement and some of its keywords from ChapterÂ 5.

Adding a PRIMARY KEY to an existing table

Hereâs the code to add an AUTO_INCREMENT primary key to Gregâs my_contacts table. (Itâs a long command, so youâll need to turn your book.)

Brain Power

Do you think that this will add values to the new contact_id column for records already in the table or only for newly inserted records? How can you check?

ALTER TABLE and add a PRIMARY KEY

Try the code yourself. Open your SQL terminal. USE the gregs_list database, and type in this command:

To see what happened to your table, try a SELECT * from my_contacts;

Will Greg get his phone number column? Turn to ChapterÂ 5 to find out.

Your SQL Toolbox

Youâve got ChapterÂ 4 under your belt. Look at all the new tools youâve added to your toolbox now! For a complete list of tooltips in the book, see AppendixÂ C.

Sharpen your pencil

Letâs make the clown table more atomic. Assuming you need to search on data in the appearance and activities columns, as well as last_seen, write down some better choices for columns.

Thereâs no definite correct answer here.

The best you can do is to pull out things like gender, shirt color, pant color, hat type, musical instrument, transportation, balloons (yes or no for values), singing (yes or no for values), dancing (yes or no for values).

To make this table atomic, youâve got to get those multiple activities into separate columns, and those multiple appearance features separated out.

Bonus points if you wanted to separate out the location column into address, city, and state!

^[1]Some people think that RELATIONAL means multiple tables relating to each other. Thatâs not correct.

Get Head First SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

ChapterÂ 4.Â Smart Table Design: Why be normal?

Two fishy tables

Brain Power

A table is all about relationships

Note

Note

Note

Brain Power

Atomic data

30 minutes or itâs free

Location, location, location

Atomic data and your tables

Note

Note

Note

Note

Reasons to be normal

The benefits of normal tables

Clowns arenât normal

Note

Halfway to 1NF

Note

Note

Brain Power

PRIMARY KEY rules

Watch it!

Brain Power

Geek Bits

Getting to NORMAL

Brain Power

Fixing Gregâs table

The CREATE TABLE we wrote

Brain Power

Show me the

Time-saving command

Note

The CREATE TABLE with a PRIMARY KEY

1, 2, 3... auto incrementally

Adding a PRIMARY KEY to an existing table

Brain Power

ALTER TABLE and add a PRIMARY KEY

Your SQL Toolbox

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

30 minutes or itâs free

Clowns arenât normal

Fixing Gregâs table