The 15-Minute SQL Tutorial - Automating System Administration with Perl

by David N. Blank-Edelman

Relational databases can be excellent tools for system administration. A relational database is accessed and administered using Structured Query Language (SQL) statements. As a result, it is a good idea for system administrators to learn at least the basics of SQL. The goal of this appendix is not to make you a full-time database programmer or even a real database administrator; that takes years of work and considerable expertise. However, we can look at enough SQL that you can begin to fake it. You may not be able to speak the language, but you’ll at least get the gist if someone speaks it at you, and you’ll know enough to go deeper into the subject if necessary. These basic building blocks are used extensively in Chapter 7, SQL Database Administration, where we integrate SQL and Perl.

System Administration with Perl, Second Edition book cover

This excerpt is from Automating System Administration with Perl, Second Edition . Thoroughly updated and expanded in its second edition to cover the latest operating systems, technologies, and Perl modules, Automating System Administration with Perl will help you perform your job with less effort. The second edition not only offers you the right tools for your job, but also suggests the best way to approach particular problems and securely automate pressing tasks.

buy button

SQL is a command language for performing operations on relational databases and their component parts. Tables are the component parts you’ll deal with most often. Their column and row structure makes them look a great deal like spreadsheets, but the resemblance is only surface-level. Table elements are not used to represent relationships to other elements—that is, table elements don’t hold formulas, just data. Most SQL statements are devoted to working with the data in these rows and columns, allowing the user to add, delete, select, sort, and relate it between tables.

Let’s go over some of the operators offered by SQL. If you want to experiment with the operators we’ll be discussing, you’ll need access to a SQL database. You may already have access to a server purchased from Oracle, Sybase, IBM, Microsoft, or elsewhere. If not, you can download an excellent open source database called MySQL from http://www.mysql.org. Another, simpler (no server required) open source database engine can be found at http://www.sqlite.org.

For this appendix, we’ll be using a mostly generic SQL dialect, though each database server has its own SQL quirks. SQL statements particular to a specific database implementation will be noted.

The SQL code that follows will be shown using the capitalization standard found in most SQL books. This standard capitalizes all reserved words in a statement.

Most of the example SQL code in this appendix will use a table that mirrors the flat-file machine database we saw in Chapter 5, TCP/IP Name and Configuration Services. As a quick refresher, Table D.1, “Our machine table” shows how that data looks in table form.

Prod: Not sure this will work as a table with the long values and 9 columns, but AU says it was OK in 1st edition?

Table D.1. Our machine table

name

ipaddr

aliases

owner

dept

bldg

room

manuf

model

shimmer

192.168.1.11

shim shimmy shimmydoodles

David Davis

Soft-ware

Main

909

Sun

M4000

bendir

192.168.1.3

ben bendoodles

Cindy Coltrane

IT

West

143

Apple

Mac Pro

sander

192.168.1.55

sandy micky mickydoo

Alex Rollins

IT

Main

1101

Dell

Optiplex 740

sulawesi

192.168.1.12

sula su-lee

Ellen Monk

Design

Main

1116

Apple

Mac Pro


Creating/Deleting Databases and Tables

In the beginning, the server will be empty and void of objects useful to us. Let’s create our database:

CREATE DATABASE sysadm ON userdev=10 LOG ON userlog=5
GO

This SQL statement creates a 10 MB database on the device userdev with a 5 MB log file on the userlog device. This statement is Sybase/Microsoft SQL Server-specific: database creation (when performed at all) takes place in different ways on different servers.[140] In this case, it is placing the database in a predefined storage device (an area defined as part of the storage allocation for the database server) and keeping the logging information (all of the info about the operations on the database and other housekeeping info) in a separate device.

Note

The GO command is used with some interactive database clients to indicate that the preceding SQL statement should be executed. (These clients also often provide additional commands beyond just plain SQL for working with the databases; e.g., MySQL has a DESCRIBE command for displaying information about tables.) It is not a SQL statement itself. Other databases require you to type a semicolon at the end of each statement. In the following examples, we’ll assume that GO or a semicolon will follow each individual SQL statement if you are using one of these clients. We’ll also be using the SQL commenting convention of -- for comments in the SQL code.

To remove this database, we can use the DROP command:

DROP DATABASE sysadm

Now let’s actually create an empty table to hold the information shown in Table D-1:

USE sysadm
-- Last reminder: you need to type GO or ; here (if you are using
-- an interactive client that requires this) before entering the
-- next statement
CREATE TABLE hosts (
  name      varchar(30)     NOT NULL,
  ipaddr    varchar(15)     NOT NULL,
  aliases   varchar(50)     NULL,
  owner     varchar(40)     NULL,
  dept      varchar(15)     NULL,
  bldg      varchar(10)     NULL,
  room      varchar(4)      NULL,
  manuf     varchar(10)     NULL,
  model     varchar(10)     NULL
)

First we indicate which database we wish to use (sysadm). The USE statement takes effect only if it is run separately before any other commands are executed; hence, it gets its own statement.

Next, we create a table by specifying the name, data type/length, and NULL/NOT NULL settings for each column. Let’s talk a little bit about data types.

It is possible to hold several different types of data in a database table, including numbers, dates, text, and even images and other binary data. When each column is created, the kind of data it will hold is specified. Our needs are modest, so this table is composed of a set of columns that hold simple strings of varchars (non-space-padded characters). Some SQL servers allow you to create user-defined aliases for data types, like ip_address or employee_id. User-defined data types are used in table creation to keep table structures readable and data formats consistent between columns across multiple tables.

The last parameter in our previous command declares a column to be either mandatory or optional. If this parameter is set to NOT NULL, a row cannot be added to the table if it lacks data in this column. In our example, we need a machine name and IP address for a machine record to be useful to us, so we declare those fields NOT NULL. All the rest (though highly desirable) are optional, so we declare them NULL. There are other constraints besides NULL/NOT NULL that can be applied to a column for the purposes of data consistency. For instance, in some SQL dialects, we could ensure that two machines cannot have the same name by changing this:

name      varchar(30)     NOT NULL,

to:

name      varchar(30)     NOT NULL CONSTRAINT unique_name UNIQUE,

where unique_name is the name of this particular constraint. Naming your constraints makes the error messages generated by constraint violations more useful. See your server documentation for other constraints that can be applied to a table.

Deleting entire tables from a database is considerably simpler than creating them:

USE sysadm
DROP TABLE hosts

Inserting Data into a Table

Now that we have an empty table, let’s look at two ways to add new data. Here’s the first form:

USE sysadm
INSERT hosts
   VALUES (
     'shimmer',
     '192.168.1.11',
     'shim shimmy shimmydoodles',
     'David Davis',
     'Software',
     'Main',
     '309',
     'Sun',
     'Ultra60'
   )

The first line tells the server we are going to work with objects in the sysadm database. The second line selects the hosts table and adds a row, one column at a time. This version of the INSERT command is used to add a complete row to the table (i.e., one with all columns filled in).[141]

To create a new row with a partial record we can specify the columns to fill, like so:

USE sysadm
INSERT hosts (name,ipaddr,owner)
   VALUES (
     'bendir',
     '192.168.1.3',
     'Cindy Coltrane'
    )

The INSERT command will fail if we try to insert a row that does not have all of the required (NOT NULL) columns.

INSERT can also be used to add data from one table to another; we’ll see this usage later. For the rest of our examples, assume that we’ve fully populated the hosts table using the first form of INSERT.

Querying Information

As an administrator, the SQL command you’ll probably use the most often is SELECT. SELECT is used to query information from a server. Before we talk about this command, a quick disclaimer: SELECT is a gateway into a whole wing of the SQL language. We’re only going to explore some of its simpler forms. There is an art to constructing good queries (and designing databases so they can be queried well), but more in-depth coverage of this topic is best found in books entirely devoted to SQL and databases.

The simplest SELECT form is used mostly for retrieving server- and connection-specific information. With this form, you do not specify a data source. Here are three examples:

-- Sybase/MS-SQL - retrieve server name
SELECT @@SERVERNAME

-- MySQL - retrieve current version
SELECT VERSION();

-- Oracle - retrieve STARTUP_TIME
SELECT STARTUP_TIME from v$instance;

These examples show significant differences in the retrieval of database-specific information.

Retrieving All of the Rows in a Table

To get at all of the data in our hosts table, we can use this SQL code:

USE sysadm
SELECT * FROM hosts

This returns all of the rows and columns, in the column order in which our table was created:

name      ipaddr        aliases                    owner           dept
bldg  room  manuf  model
--------- ------------- -------------------------- --------------- ---------
----- ----- ------ -------------
shimmer   192.168.1.11  shim shimmy shimmydoodles  David Davis     Software
Main  309   Sun    M4000
bendir    192.168.1.3   ben bendoodles             Cindy Coltrane  IT
West  143   Apple  Mac Pro
sander    192.168.1.55  sandy micky mickydoo       Alex Rollins    IT
Main  1101  Dell   Optiplex 740
sulawesi  192.168.1.12  sula su-lee                Ellen Monk      Design
Main  1116  Apple  Mac Pro

If we want to see specific columns, we just need to specify them by name:

USE sysadm
SELECT name,ipaddr FROM hosts

When we specify the columns by name they are returned in the order we specify them, independent of the order used when the table was created. For instance, to see IP addresses per building, we could use this query:

USE sysadm
SELECT bldg,ipaddr FROM hosts

This returns:

bldg       ipaddr
---------- ---------------
Main       192.168.1.11
West       192.168.1.3
Main       192.168.1.55
Main       192.168.1.12

Retrieving a Subset of the Rows in a Table

Databases wouldn’t be very interesting if you couldn’t retrieve a subset of your data. In SQL, we use the SELECT command and add a WHERE clause containing a conditional:

USE sysadm
SELECT * FROM hosts WHERE bldg='Main'

This shows:

name      ipaddr        aliases                    owner         dept      bldg
room  manuf  model
--------- ------------- -------------------------- ------------- --------- -----
----- ------ -------------
shimmer   192.168.1.11  shim shimmy shimmydoodles  David Davis   Software  Main
309   Sun    M4000
sander    192.168.1.55  sandy micky mickydoo       Alex Rollins  IT        Main
1101  Dell   Optiplex 740
sulawesi  192.168.1.12  sula su-lee                Ellen Monk    Design    Main
1116  Apple  Mac Pro

The set of available conditional operators for WHERE clauses contains the standard programming fare:

=      >      >=      <      <=      <>

Unlike Perl, SQL does not have separate string and numeric comparison operators.

Conditional operators can be combined with AND/OR and negated with NOT. We can test for an empty column using IS NULL, or for a non-empty column with IS NOT NULL. For instance, this SQL code will show all of the machines without owners listed in our table:

USE sysadm
SELECT name FROM hosts WHERE owner IS NULL

If you want to find all of the rows that have a column whose content is one of several specified values, you can use the IN operator to specify a list of values:

USE sysadm
SELECT name FROM hosts WHERE dept IN ('IT', 'Software')

This shows all of the machines in use in either the IT or software departments. SQL will also allow you to return rows that match a certain range of values (most useful with numeric or date values) with the BETWEEN operator. Here’s an example that shows all of the machines in the main building between the 10th and 19th floors (presuming you use a simple convention for room numbers):

USE sysadm
SELECT name FROM hosts
  WHERE (bldg = 'Main') AND
        (room BETWEEN '1000' AND '1999')

Finally, the WHERE clause can be used with LIKE to choose rows using weak pattern matching (weak, that is, in comparison to Perl’s regular expressions). For instance, this will select all of the machines that have the string “doodles” somewhere in their aliases:

USE sysadm
SELECT name FROM hosts WHERE aliases LIKE '%doodles%'

Table D.2, “LIKE wildcards” lists the supported LIKE wildcards.

Table D.2. LIKE wildcards

Wildcard

Meaning

Closest Perl regexp equivalent

%

Zero or more characters

.*

_

A single character

.

[]

A single character that is one of a specified set or range

[]


Some database servers have added extensions to SQL to allow for regular expression use in SELECTs. For instance, MySQL offers the REGEXP operator for use with SELECT. REGEXP doesn’t have all the power of Perl’s regular expression engine, but it offers a substantial increase in flexibility over the standard SQL wildcards.

Simple Manipulation of Data Returned by Queries

Three useful clauses for a SELECT statement are COUNT, DISTINCT, and ORDER BY. The first returns the number of rows retrieved:

USE sysadm
SELECT COUNT(*) FROM hosts

The second allows us to eliminate duplicate records returned by a query. If we want a list of all of the distinct manufacturers represented in our hosts table, we can use DISTINCT:

USE sysadm
SELECT DISTINCT manuf FROM hosts

The third clause allows us to specify the order of the returned results. If we want to see our data returned in a sorted order, we can use ORDER BY:

USE sysadm
SELECT name,ipaddr,dept,owner FROM hosts ORDER BY dept

Experienced database users often habitually add ORDER BY clauses to queries that return multiple rows because it makes dealing with the returned information easier.

SQL has several operators that can be used to modify the output returned by a query. They allow you to change column names, do summary and intra/intercolumn calculations, reformat how fields are displayed, perform subqueries, and a whole host of other things. Please see a dedicated SQL book for more detail on SELECT’s many clause operators.

Adding the Query Results to Another Table

A new table containing the results of a query can be created on the fly by using an INTO clause on some SQL servers:

USE sysadm
SELECT name,ipaddr INTO itmachines FROM hosts WHERE dept = 'IT'

This statement works just like those we’ve seen previously, except that the results of the query are added to another table called itmachines. With some servers, this table is created on the fly if it does not exist. You can think of this operator clause as the equivalent of the > operator in most Unix- and Windows-based operating system command-line shells.

Note

Some database servers (like MySQL[142]) do not support SELECT INTO; they require the use of different syntax. For example, Oracle uses something like this:

CREATE TABLE COPY AS SELECT name,ipaddr FROM hosts WHERE dept = 'IT'

Some other servers instead use an INSERT command to perform this action. Still others, such as Microsoft SQL Server and Sybase, require that a special flag be set on a database before SELECT INTO can be used within that database, or the command will fail.

Changing Table Information

Our working knowledge of the SELECT command comes into play with other commands as well. For instance, the INSERT command we saw earlier can also take a SELECT clause. This allows us to insert query information into an existing table. If our software department were to merge with IT, we could add their machines to the itmachines table:

USE sysadm
INSERT itmachines
  SELECT name,ipaddr FROM hosts
  WHERE dept = 'Software'

If we want to change any of the rows in our table, we can use the UPDATE command. For example, if all of the departments in the company moved into a single facility called Central, we could change the name of the building in all rows like so:

USE sysadm
UPDATE hosts
  SET bldg = 'Central'

It’s more likely that we’ll need to change only certain rows in a table. For that task, we use the handy WHERE clause we saw when discussing the SELECT operator:

USE sysadm
UPDATE hosts
  SET dept = 'Development'
  WHERE dept = 'Software'

That changed the name of the Software department to Development. This moves the machine called bendir to our Main building:

USE sysadm
UPDATE hosts
  SET bldg = 'Main'
  WHERE name = 'bendir'

If we want to remove a row or set of rows from a table instead of updating them, we can use the DELETE command:

USE sysadm
DELETE FROM hosts
  WHERE bldg = 'East'

While there’s no standardized way to undo a straight DELETE operation,[143] you can gain some safety using transactions (outside the scope of this appendix). In many cases you can run the DELETE command as a SELECT first to gain an understanding of just what effect the DELETE will have. Still, be careful with these operations.

Relating Tables to Each Other

Relational databases offer many ways to forge connections between the data in two or more tables. This process is known as “joining” the tables. Joins can get complex quickly, given the number of query possibilities involved and the fine control the programmer has over the data that is returned. There are different flavors of joints (inner, outer, etc.) but we’re not going to get into those here. If you are interested in this level of detail, your best bet is to seek out a book devoted to SQL.

Here is one example of a join in action. For this example we’ll use another table called contracts, which contains information on the maintenance contracts for each of our machines. That table is shown in Table D.3, “Our contracts table”.

Table D.3. Our contracts table

name

servicevendor

startdate

enddate

bendir

IBM

09-09-2005

06-01-2008

sander

Dell

03-14-2008

03-14-2009

shimmer

Sun

12-12-2008

12-12-2009

sulawesi

Apple

11-01-2005

11-01-2008


Here’s one way to relate our hosts table to the contracts table using a join:

USE sysadm
SELECT contracts.name,servicevendor,enddate,bldg,room
  FROM contracts, hosts
  WHERE contracts.name = hosts.name

The easiest way to understand this code is to read it from the middle out. FROM contracts, hosts tells the server that we wish to relate the contracts and hosts tables. WHERE contracts.name = hosts.name says we will match a row in contracts to a row in hosts based on the contents of the name field in each table. Note that we say contracts.name because we need to distinguish which name field we are using (the one on the contracts table). Finally, the SELECT line specifies the columns we wish to appear in our output.

SQL Stragglers

Before we close this tutorial section, there are a few more advanced SQL topics you may encounter in your travels.

Views

Most SQL servers allow you to create different views of a table. Views are like magic permanent SELECT queries. Once you create a view using a special SELECT query, the specification of your query sticks around. Each time you access anything from the view, the original query is run to provide that information. Views can be queried like any other table. Modifications to a view, with a few restrictions, are propagated back to the original table or tables.

Note I said tables. Here’s where the magic of views comes in: a view on a table can be created that consists of a join between that table and another. This view behaves as one large virtual table. Changes to this view are propagated back to the original tables that are part of the join that created the view.

A view can also be created with a new column consisting of calculations performed between other columns in that table, almost like in a spreadsheet. Views are also useful for more mundane purposes, such as query simplification (e.g., you may be able to SELECT fewer columns) and data restructuring (e.g., table users see a view of the data that doesn’t change, even if other columns in the underlying table structure are modified).

Here’s a view-creation example that demonstrates query simplification:

USE sysadm
CREATE VIEW ipaddr_view AS SELECT name, ipaddr FROM hosts

Now we can use a very simple query to get back just the information we need:

USE sysadm
SELECT * FROM ipaddr_view

The result of this query is:

name                           ipaddr
------------------------------ ---------------
shimmer                        192.168.1.11
bendir                         192.168.1.3
sander                         192.168.1.55
sulawesi                       192.168.1.12

Like tables, views are dropped using a form of the DROP command:

USE sysadm
DROP VIEW ipaddr_view

Dropping the view has no effect on the underlying data tables.

Cursors

In all the queries we’ve seen thus far, we’ve asked the server to hand us back all of the results once the query has completed. But sometimes it is preferable to receive the answer to a query one line at a time. This is most often the case when embedding SQL queries in other programs. If your query returns tens of thousands of lines, chances are pretty good that you’ll want to process the results one line at a time, rather than storing them all in memory for later use. SQL programming in Perl often uses this line-at-a-time method. Here’s a small native-SQL program that demonstrates cursor use on a Sybase or Microsoft SQL Server:

USE sysadm
-- declare our variables
DECLARE @hostname varchar(30)
DECLARE @ip varchar(15)

-- declare our cursor
DECLARE hosts_curs CURSOR FOR SELECT name,ipaddr FROM hosts

-- open this cursor
OPEN hosts_curs

-- iterate over the table, fetching rows one at a time,
-- until we receive an error
FETCH hosts_curs INTO @hostname,@ip
WHILE (@@fetch_status = 0)
  BEGIN
     PRINT "----"
     PRINT @hostname
     PRINT @ip
     FETCH hosts_curs INTO @hostname,@ip
  END

-- close the cursor (not strictly necessary when followed
-- by a DEALLOCATE)
CLOSE hosts_curs

-- undefine cursor def
DEALLOCATE hosts_curs

This produces the following output:

----
shimmer
192.168.1.11
----
bendir
192.168.1.3
----
sander
192.168.1.55
----
sulawesi
192.168.1.12

Stored Procedures

Most database systems allow you to upload SQL code to the server, where it is stored in an optimized, post-parsed form for faster execution. Such uploads are known as stored procedures. Stored procedures are often a critical component of SQL for administrators, because large parts of server administration for some servers rely on them. For example, to change the owner of the sysadm database in Sybase, you might do this:

USE sysadm
sp_changedbowner "jay"

Some databases also support something called “triggers.” Triggers are stored procedures that automatically fire when some event takes place in the database (e.g., when a row gets INSERTed). Each database vendor implements triggers slightly differently, so check the documentation of the database you are using for the details on how to use CREATE TRIGGER and DROP TRIGGER.

Now that you’ve seen the basics of SQL, you’re ready to tackle Chapter 7, SQL Database Administration.



[140] In fact, different servers even have different ideas about what the meaning of “database” is. The term is broader for an Oracle DBA than it is for a MySQL DBA.

[141] Experienced SQL users would probably suggest you always specify the column destinations for each piece of data (even when inserting a complete row), as per the next example. This makes the INSERT statement more robust, because it isn’t prone to errors in order (e.g., if you should add another field to the database).

[142] Just to be clear: MySQL 5.x does have a SELECT .. INTO, but it dumps data to a regular file, not to a database, as we’ve been discussing. For MySQL, you’ll want to use INSERT .. INTO instead.

[143] Oracle 10g and beyond offer a flashback facility that can undo DELETE and DROP operations, depending on the amount of data in play and how much the database has changed since the destructive operations were performed.

If you enjoyed this excerpt, buy a copy of Automating System Administration with Perl, Second Edition .