BUY THIS BOOK
Add to Cart

Print Book $34.95


Add to Cart

Print+PDF $45.44

Add to Cart

PDF $27.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £24.95

What is this?

Looking to Reprint or License this content?


Oracle SQL*Loader: The Definitive Guide
Oracle SQL*Loader: The Definitive Guide By Jonathan Gennick, Sanjay Mishra
April 2001
Pages: 272

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction to SQL*Loader
SQL*Loader is an Oracle-supplied utility that allows you to load data from a flat file into one or more database tables. That's it. That's the sole reason for SQL*Loader's existence.
The basis for almost everything you do with SQL*Loader is a file known as the control file . The SQL*Loader control file is a text file into which you place a description of the data to be loaded. You also use the control file to tell SQL*Loader which database tables and columns should receive the data that you are loading.
Do not confuse SQL*Loader control files with database control files. In a way, it's unfortunate that the same term is used in both cases. Database control files are binary files containing information about the physical structure of your database. They have nothing to do with SQL*Loader. SQL*Loader control files, on the other hand, are text files containing commands that control SQL*Loader's operation.
Once you have a data file to load and a control file describing the data contained in that data file, you are ready to begin the load process. You do this by invoking the SQL*Loader executable and pointing it to the control file that you have written. SQL*Loader reads the control file to get a description of the data to be loaded. Then it reads the input file and loads the input data into the database.
SQL*Loader is a very flexible utility, and this short description doesn't begin to do it justice. The rest of this chapter provides a more detailed description of the SQL*Loader environment and a summary of SQL*Loader's many capabilities.
When we speak of the SQL*Loader environment, we are referring to the database, the SQL*Loader executable, and all the different files that you need to be concerned with when using SQL*Loader. These are shown in Figure 1-1.
Figure 1-1: The SQL*Loader environment
The functions of the SQL*Loader executable, the database, and the input data file are rather obvious. The SQL*Loader executable does the work of reading the input file and loading the data. The input file contains the data to be loaded, and the database receives the data.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The SQL*Loader Environment
When we speak of the SQL*Loader environment, we are referring to the database, the SQL*Loader executable, and all the different files that you need to be concerned with when using SQL*Loader. These are shown in Figure 1-1.
Figure 1-1: The SQL*Loader environment
The functions of the SQL*Loader executable, the database, and the input data file are rather obvious. The SQL*Loader executable does the work of reading the input file and loading the data. The input file contains the data to be loaded, and the database receives the data.
Although Figure 1-1 doesn't show it, SQL*Loader is capable of loading from multiple files in one session. You'll read more about this in Chapter 2. When multiple input files are used, SQL*Loader will generate multiple bad files and discard files—one set for each input file.
The SQL*Loader control file is the key to any load process. The control file provides the following information to SQL*Loader:
  • The name and location of the input data file
  • The format of the records in the input data file
  • The name of the table or tables to be loaded
  • The correspondence between the fields in the input record and the columns in the database tables being loaded
  • Selection criteria defining which records from the input file contain data to be inserted into the destination database tables.
  • The names and locations of the bad file and the discard file
Some of the items shown in this list may also be passed to SQL*Loader as command-line parameters. The name and location of the input file, for example, may be passed on the command line instead of in the control file. The same goes for the names and locations of the bad files and the discard files.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Short SQL*Loader Example
This section contains a short example showing how SQL*Loader is used. For this example, we'll be loading a file of geographic place names taken from the United States Geological Survey's (USGS) Geographic Name Information System (GNIS).
Learn more about GNIS data or download it for yourself by visiting http://mapping.usgs.gov/www/gnis/. The specific data file used for this example is also available from http://www.oreilly.com/catalog/orsqlloader and http://gennick.com/sqlldr.
The particular file used for this example contains the feature name listing for the State of Michigan. It's a delimited text file containing the official names of the many thousands of lakes, streams, waterfalls, and other geographic features in the state. The following example shows three records from that file. The lines wrap on the printed page in this book, but in the file each name is on its own line:
"MI","Agate Falls","falls","Ontonagon","26","131","462851N","0890527W",
"46.48083","-89.09083","","","","","","","Trout Creek"

"MI","Agate Harbor","bay","Keweenaw","26","083","472815N","0880329W",
"47.47083","-88.05806","","","","","","","Delaware"

"MI","Agate Point","cape","Keweenaw","26","083","472820N","0880241W",
"47.47222","-88.04472","","","","","","","Delaware"
As you can see, the data in the file is comma-delimited, and each field is enclosed within double quotes. Table 1-1 shows the contents and maximum length of each field.
Table 1-1: Fields in the GNIS Feature Names File
Field Number
Maximum Length
Contents
1
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
SQL*Loader's Capabilities
SQL*Loader is very flexible, and the example in the previous section shows only a small amount of what can be done using the utility. Here are the major SQL*Loader capabilities that you should be aware of:
  • SQL*Loader can read from multiple input files in a single load session.
  • SQL*Loader can handle files with fixed-length records, variable-length records, and stream-oriented data.
  • SQL*Loader supports a number of different datatypes, including text, numeric, zoned decimal, packed decimal, and various machine-specific binary types.
  • Not only can SQL*Loader read from multiple input files, but it can load that data into several different database tables, all in the same load session.
  • SQL*Loader allows you to use Oracle's built-in SQL functions to manipulate the data being read from the input file.
  • SQL*Loader includes functionality for dealing with whitespace, delimiters, and null data.
  • In addition to standard relational tables, SQL*Loader can load data into object tables, varying arrays (VARRAYs), and nested tables.
  • SQL*Loader can load data into large object (LOB) columns.
  • SQL*Loader can handle character set translation between the input data file and the database.
The capabilities in this list describe the types of data that SQL*Loader can handle, and what SQL*Loader can do to with that data. SQL*Loader also implements some strong, performance-related features. SQL*Loader can do direct path loads, which bypass normal SQL statement processing, and which may yield handsome performance benefits. SQL*Loader can also do parallel loads and even direct-path parallel loads; direct path parallel loads allow you to maximize throughput on multiple CPU systems. You'll read more about these performance-related features in Chapter 9, and in Chapter 10.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Issues when Loading Data
There are a number of issues that you need to be concerned about whenever you use SQL*Loader to load data into your database—indeed, you need to be concerned about these whenever you load data, period. First, there's the ever-present possibility that the load will fail in some way before it is complete. If that happens, you'll be left with some data loaded, and some data not loaded, and you'll need a way to back out and try again. Other SQL*Loader issues include transaction size, data validation (including referential integrity), and data transformation. Transaction size is partly a performance issue, but it also has an impact on how much data you need to reload in the event that a load fails. Data validation and referential integrity both relate to the need for clean, reliable data.
There are really only two fundamental ways that you can recover from a failed load. One approach is to delete all the data that was loaded before the failure occurred, and simply start over again. Of course, you need to fix whatever caused the failure to occur before you restart the load. The other approach is to determine how many records were loaded successfully, and to restart the load from that point forward. Regardless of which method you choose, you need to think things through before you start a load.
Deleting data and restarting a load from scratch really doesn't require any special functionality on the part of SQL*Loader. The important thing is that you have a reliable way to identify the data that needs to be deleted. SQL*Loader does, however, provide support for continuing an interrupted load from the point where a failure occurred. Using the SKIP command-line parameter, or the SKIP clause in the control file, you can tell SQL*Loader to skip over records that were already processed in order to have the load pick up from where it left off previously. Chapter 6, describes the process for continuing a load in detail, and some of the issues you'll encounter. It's a chapter worth reading, because there are some caveats and gotchas, and you'll want to learn about those before you have a failure, not afterwards.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Invoking SQL*Loader
On Unix systems, the command used to invoke SQL*Loader is sqlldr . On Windows systems running Oracle8i, release 8.1 or higher, the command is also sqlldr. Prior to release 8.1, the SQL*Loader command on Windows systems included the first two digits of the Oracle release number. Thus you had sqlldr80 (Oracle8, release 8.0), sqlldr73 (Oracle7, release 7.3), and so forth.
SQL*Loader can be invoked in one of three ways:
sqlldr

sqlldr keyword=value [keyword=value ...]

sqlldr value [value ...]
Issuing the sqlldr command by itself results in a list of valid command-line parameters being displayed. Command-line parameters are usually keyword/value pairs, and may be any combination of the following:
USERID={username[/password][@net_service_name]|/}
CONTROL=control_file_name
LOG=path_file_name 
BAD=path_file_name
DATA=path_file_name
DISCARD=path_file_name
DISCARDMAX=logical_record_count
SKIP=logical_record_count
SKIP_INDEX_MAINTENANCE={TRUE | FALSE}
SKIP_UNUSABLE_INDEXES={TRUE | FALSE}
LOAD=logical_record_count
ERRORS=insert_error_count
ROWS=rows_in_bind_array
BINDSIZE=bytes_in_bind_array
SILENT=[(]keyword[,keyword...][)]
DIRECT={TRUE | FALSE}
PARFILE=path_file_name
PARALLEL={TRUE | FALSE}
READSIZE=bytes_in_read_buffer
FILE=database_datafile_name
         
Command-line parameters may be passed by position instead of by keyword. The rules for doing this are described at the end of the next section.
The SQL*Loader parameter descriptions are as follows:
USERID = {username[/password] [@net_service_name]|/}
Specifies the username and password to use when connecting to the database. The net_service_name parameter optionally allows you to connect to a remote database. Use a forward-slash character ( / ) to connect to a local database using operating system authentication. On Unix systems, you may want to omit the password and allow SQL*Loader to prompt you for it. If you omit both the username and the password, SQL*Loader will prompt you for both.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: The Mysterious Control File
The control file is the key to SQL*Loader. Understanding the control file is like having the keys to the kingdom. You'll not only be able to get work done, you'll be able to leverage all of SQL*Loader's built-in capabilities. As a result, you'll work less, and SQL*Loader will work more.
This chapter describes the three parts of the control file:
  • The LOAD statement
  • Command-line parameters (the OPTIONS command)
  • Data
The LOAD statement is present in any SQL*Loader control file. Command-line parameters and data are optional.
Before getting into the details of the LOAD, it's worth taking the time to understand some things about control file syntax. There are various issues with respect to formatting, case sensitivity, special characters, and reserved words that you should at least be aware of. Usually you don't need to think much about any of these issues, but sooner or later you will find all this information to be helpful.
Control file syntax is free format in the sense that you can insert any amount of whitespace between any two syntax elements. A syntax element could be, for instance, a keyword, or it could be a value that you supply. Carriage returns, line feeds, and tabs, as well as space characters, are all considered whitespace. Thus, the following two LOAD statements are considered identical even though they are formatted differently:
LOAD DATA
INFILE 'mi_deci.'
BADFILE 'mn_deci.bad' DISCARDFILE 'mn_deci.dis'
TRUNCATE INTO TABLE gfn_gnis_feature_names 
WHEN gfn_feature_type='lake' (
gfn_state_abbr CHAR TERMINATED BY "," ENCLOSED BY '"', 
gfn_feature_name CHAR TERMINATED BY "," ENCLOSED BY '"', 
gfn_feature_type CHAR TERMINATED BY "," ENCLOSED BY '"')

LOAD DATA
INFILE 'mi_deci.' 
   BADFILE 'mn_deci.bad' 
   DISCARDFILE 'mn_deci.dis'
TRUNCATE 
INTO TABLE gfn_gnis_feature_names
   WHEN gfn_feature_type='lake' (
   gfn_state_abbr CHAR TERMINATED BY "," ENCLOSED BY '"',
   gfn_feature_name CHAR TERMINATED BY "," ENCLOSED BY '"',
   gfn_feature_type CHAR TERMINATED BY "," ENCLOSED BY '"'
   )
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Syntax Rules
Before getting into the details of the LOAD, it's worth taking the time to understand some things about control file syntax. There are various issues with respect to formatting, case sensitivity, special characters, and reserved words that you should at least be aware of. Usually you don't need to think much about any of these issues, but sooner or later you will find all this information to be helpful.
Control file syntax is free format in the sense that you can insert any amount of whitespace between any two syntax elements. A syntax element could be, for instance, a keyword, or it could be a value that you supply. Carriage returns, line feeds, and tabs, as well as space characters, are all considered whitespace. Thus, the following two LOAD statements are considered identical even though they are formatted differently:
LOAD DATA
INFILE 'mi_deci.'
BADFILE 'mn_deci.bad' DISCARDFILE 'mn_deci.dis'
TRUNCATE INTO TABLE gfn_gnis_feature_names 
WHEN gfn_feature_type='lake' (
gfn_state_abbr CHAR TERMINATED BY "," ENCLOSED BY '"', 
gfn_feature_name CHAR TERMINATED BY "," ENCLOSED BY '"', 
gfn_feature_type CHAR TERMINATED BY "," ENCLOSED BY '"')

LOAD DATA
INFILE 'mi_deci.' 
   BADFILE 'mn_deci.bad' 
   DISCARDFILE 'mn_deci.dis'
TRUNCATE 
INTO TABLE gfn_gnis_feature_names
   WHEN gfn_feature_type='lake' (
   gfn_state_abbr CHAR TERMINATED BY "," ENCLOSED BY '"',
   gfn_feature_name CHAR TERMINATED BY "," ENCLOSED BY '"',
   gfn_feature_type CHAR TERMINATED BY "," ENCLOSED BY '"'
   )
The second LOAD statement is obviously easier to follow. When writing control files, it's best to use whitespace and indentation as tools to make your control files readable. In this example, the INFILE and INTO TABLE clauses are both subordinate to LOAD DATA, and they appear flush-left underneath those keywords. The BADFILE and DISCARDFILE clauses are related to the input file, and thus have been indented underneath INFILE. Similarly, the WHEN clause is part of the INTO TABLE clause, and has been indented accordingly.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The LOAD Statement
When you run SQL*Loader, it's ultimately the LOAD statement that tells SQL*Loader what data to load, where to get it from, where to put it, and how to interpret it. You saw a simple example of the LOAD statement in Chapter 1. However, LOAD statements can be much longer and more complex than what you've seen so far.
LOAD statements are made up of the following components:
  • The LOAD keyword that begins the statement
  • One or more INFILE clauses to identify the input data files
  • One or more INTO TABLE clauses to identify the target database tables
  • Optionally, a CONCATENATE clause to specify the rules for concatenating multiple physical records into one logical record
  • Other miscellaneous clauses
In the sections that follow, you'll learn how to write the clauses that specify input files and target tables for the data that you want to load. Chapter 3 shows you how to describe the data in your input files. Subsequent chapters cover other clauses that are used for validation, data transformation, continuation, and other SQL*Loader features.
The following diagram shows the basic syntax of the LOAD statement:
[UNRECOVERABLE | RECOVERABLE] {LOAD | CONTINUE_LOAD} [DATA]
   [CHARACTERSET character_set] [BYTEORDER]
   [INFILE clause [INFILE clause...]]
   [MAXRECORDSIZE bytes
   [READBUFFERS integer]
   [INSERT | APPEND | REPLACE | TRUNCATE]
   [concatenate_rules]
   [PRESERVE BLANKS]
   INTO TABLE clause [INTO TABLE clause...]
   [BEGINDATA]
The elements in the syntax are as follows:
UNRECOVERABLE
Specifies that the load not be recorded in the database redo log. This improves performance, but requires you to reload the data in the event that any affected database files are lost and subsequently restored and recovered. This option is only applicable to direct path loads. See Chapter 9, for more details on unrecoverable loads.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Command-Line Parameters in the Control File
In Chapter 1 you read about the SQL*Loader command and the parameters you can use with it. Certain command-line parameters may also be specified in the control file. You accomplish this using the OPTIONS command, which must precede your LOAD statement. The syntax for the OPTIONS command is:
OPTIONS (parameter=value[,parameter=value...])
The following parameters are valid with the OPTIONS command:
SKIP = logical_record_count
LOAD = logical_record_count
ERRORS = insert_error_count
ROWS = rows_in_bind_array
BINDSIZE = bytes_in_bind_array
SILENT = [(keyword [,keyword...] [)]
DIRECT = {TRUE | FALSE}
PARALLEL = {TRUE | FALSE}
READSIZE = bytes_in_read_buffer
Valid keywords for the SILENT parameter include ALL, DISCARDS, ERRORS, FEEDBACK, HEADER, and PARTITIONS.
You can find a detailed description of these parameters in Chapter 1. Parameter/value pairs listed in the OPTIONS statement should be separated by commas. The OPTIONS command in the following control file specifies that a silent load be performed, and sets the maximum error count to 999,999:
OPTIONS (SILENT=ALL, ERRORS=999999)
LOAD DATA
INFILE 'mi_deci.'
TRUNCATE 
INTO TABLE GNIS.GFN_GNIS_FEATURE_NAMES
   (
   gfn_state_abbr CHAR
      TERMINATED BY "," ENCLOSED BY '"',
   gfn_feature_name CHAR
      TERMINATED BY "," ENCLOSED BY '"',
   gfn_feature_type CHAR
      TERMINATED BY "," ENCLOSED BY '"',
   gfn_county_name
      CHAR TERMINATED BY "," ENCLOSED BY '"'
   )
Not all parameters may be set using the OPTIONS command. Strangely enough, however, you can specify a value for any valid command-line parameter in the OPTIONS command, and SQL*Loader will not generate an error. For example, you could write the following OPTIONS command:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Placing Data in the Control File
SQL*Loader gives you the option of placing the data to be loaded into the control file. This is sometimes useful when you have data to distribute to multiple sites, because it minimizes the number of files that you need to pass around. Instead of sending separate data and control files, you can place everything into one control file and send that.
We prefer to keep data in its own file, separate from the control file. In our experience, that ultimately provides for more flexibility.
In order to place data into your control file, you must do the following:
  • Place the BEGINDATA keyword at the end of your LOAD statement.
  • Start your data on the line following BEGINDATA.
  • Use INFILE * to tell SQL*Loader to read data from the control file.
The BEGINDATA keyword marks the end of your LOAD statement in cases where you place data into the control file. The first line following BEGINDATA then becomes the first input data record. To tell SQL*Loader to look in the control file as opposed to an external file, use an asterisk as the file name in your INFILE clause. For example:
LOAD DATA
INFILE *
TRUNCATE 
INTO TABLE GFN_GNIS_FEATURE_NAMES
   (
   gfn_state_abbr CHAR
      TERMINATED BY "," ENCLOSED BY '"',
   gfn_feature_name CHAR
      TERMINATED BY "," ENCLOSED BY '"',
   gfn_feature_type CHAR
      TERMINATED BY "," ENCLOSED BY '"',
   gfn_county_name
      CHAR TERMINATED BY "," ENCLOSED BY '"'
   )
BEGINDATA
"MI","2 Lake","lake","Marquette"
"MI","3 Lake","lake","Marquette"
"MI","8 Lake","lake","Marquette"
When your data is in the control file, the default file names for the bad and discard files are derived from the control file name. Your bad file name will be the control file name, but with the extension .bad. Your discard file name will be the control file name, but with the extension .dis. If you specify a path for your control file name because the control file is in a different directory, that same path will be used for the bad and discard files.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Fields and Datatypes
The subject of fields and datatypes cuts to the very core of SQL*Loader's capabilities. The whole purpose of the tool is to extract information from a file and load it into a database. SQL*Loader's power comes from its ability to handle a wide range of datatypes, and also from its flexibility in describing and manipulating the different fields in an input record.
A field specification is that portion of an INTO TABLE clause describing a field to be extracted from the input record. Syntactically, each INTO TABLE clause contains a comma-delimited list of field specifications in the manner shown here:
INTO TABLE table_name
   (
   field_specification,
   field_specification,
   field_specification,
   ...
   )
This comma-delimited list of field specifications is always the last component of an INTO TABLE clause, and it's always enclosed within parentheses. Field specifications typically describe fields to be extracted from records in a data file, but they can also describe values that you want SQL*Loader to generate for you. SQL*Loader supports the following five field types:
  • Scalar fields
  • Filler fields
  • Generated fields
  • Collection fields
  • Column object fields
Scalar fields are the type you deal with most often on a daily basis. Character strings and numbers are typical examples of scalar fields. Filler fields function somewhat as placeholders. They are fields that you define for use within SQL*Loader, but that are not ultimately loaded into the database. Generated fields represent values such as a record number that SQL*Loader can generate for you. You'll read about all these types in this chapter.
Collection and column object fields correspond to Oracle8i 's object features. Collection fields allow you to load nested tables and varying arrays. Column object fields allow you to load object columns. These more complex field types are big subjects in themselves, and are covered in Chapter 11.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Field Specifications
A field specification is that portion of an INTO TABLE clause describing a field to be extracted from the input record. Syntactically, each INTO TABLE clause contains a comma-delimited list of field specifications in the manner shown here:
INTO TABLE table_name
   (
   field_specification,
   field_specification,
   field_specification,
   ...
   )
This comma-delimited list of field specifications is always the last component of an INTO TABLE clause, and it's always enclosed within parentheses. Field specifications typically describe fields to be extracted from records in a data file, but they can also describe values that you want SQL*Loader to generate for you. SQL*Loader supports the following five field types:
  • Scalar fields
  • Filler fields
  • Generated fields
  • Collection fields
  • Column object fields
Scalar fields are the type you deal with most often on a daily basis. Character strings and numbers are typical examples of scalar fields. Filler fields function somewhat as placeholders. They are fields that you define for use within SQL*Loader, but that are not ultimately loaded into the database. Generated fields represent values such as a record number that SQL*Loader can generate for you. You'll read about all these types in this chapter.
Collection and column object fields correspond to Oracle8i 's object features. Collection fields allow you to load nested tables and varying arrays. Column object fields allow you to load object columns. These more complex field types are big subjects in themselves, and are covered in Chapter 11.
SQL*Loader scalar fields consist of numbers, dates, character strings, and large object (LOB) files. In most cases, a scalar field is read from an input record and loaded into a single database column. In the case of a LOB file, all the data in one operating system file is read and stored in a database column of type CLOB, BLOB, or BFILE. Information on loading LOB files can be found in Chapter 10. This chapter concentrates on the more traditional scalar datatypes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Datatypes
SQL*Loader supports a wide variety of datatypes, some of them more useful than others. These datatypes are grouped into the following two families:
  • Portable datatypes
  • Nonportable datatypes
It's critical that you understand the distinction between portable and nonportable datatypes. The distinction is simple. Nonportable datatypes are hardware-specific: you typically must read a file from the same hardware on which it was generated. Portable datatypes are for use with textual data, which can usually be transferred from one machine to another without affecting the semantics, or meaning, of the data.
The other key thing to understand about SQL*Loader datatypes is that you use them to describe the fields in the data files that SQL*Loader reads. Their names do not correspond to the datatype names of the columns that you are loading. In fact, the SQL*Loader datatypes do not need to correspond at all to the database datatypes of the columns that you are loading. SQL*Loader interprets input data using the SQL*Loader datatypes you specify. It then converts each value to the appropriate database datatype. You never have to tell SQL*Loader the database datatype, because SQL*Loader derives that information directly from the Oracle data dictionary.
While we can't point to any statistics to back up our claim, we think it's safe to say that SQL*Loader's portable datatypes are the most commonly used. In our experience, SQL*Loader is most often used to load data from a text file. This entails the use of the portable datatypes, which are listed in Table 3-1.
Table 3-1: SQL*Loader Portable Datatypes
Type
Comments
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Loading from Fixed-Width Files
Fixed-width data is one of the two overarching classifications of data that you deal with when using SQL*Loader. Fixed-width data files consist of columnar data—each field has a specific length and position that never varies from one record to the next. The other major classification, delimited data, discussed in Chapter 5, consists of variable-length fields separated by delimiter characters. Describing the fields in a fixed-width record is mostly a matter of specifying their positions and their datatypes. Occasionally, things are actually that simple. More often, you'll encounter issues such as these:
  • How do you represent null values?
  • What do you do with records that are shorter than they should be?
  • How do you handle trailing whitespace at the end of a record?
The issue of null values is probably the trickiest of these issues. Null refers to the absence of a value, yet in a fixed-width data file you always have some value for each field. Other issues occur, perhaps less frequently. At times, you may be faced with the need to concatenate multiple physical records into one long logical record. You may also find yourself faced with fixed-width fields that contain delimited data.
This chapter shows you how to define fixed-width data, and it describes the tools SQL*Loader provides that allow you to deal with the issues listed here. You'll learn how to define nulls, deal with short records and whitespace, concatenate multiple records together, and pull apart delimited, fixed-width fields.
COBOL programs almost universally generate fixed-width data, and they may in fact represent the greatest source of such data. We have certainly written enough COBOL programs over the years that transfer data between systems using flat files. If you are loading data that comes from a COBOL source, you'll most likely be working with the following SQL*Loader datatypes:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Common Datatypes Encountered
COBOL programs almost universally generate fixed-width data, and they may in fact represent the greatest source of such data. We have certainly written enough COBOL programs over the years that transfer data between systems using flat files. If you are loading data that comes from a COBOL source, you'll most likely be working with the following SQL*Loader datatypes:
CHAR
DATE
DECIMAL
ZONED
If you need a refresher on SQL*Loader datatypes, refer back to Chapter 3.
Of course, COBOL doesn't have a lock on fixed-width data. Binary files containing data in hardware-specific numeric formats represent another source of fixed-width data. Such files could be generated by a C program, and might include the following SQL*Loader datatypes:
INTEGER
SMALLINT
FLOAT
DOUBLE
BYTEINT
Character data can appear within a binary file, and in the case of C-style strings will be delimited by X'00' characters. This can lead to some interesting complications in terms of blending fixed-width and delimited data.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Specifying Field Positions
When dealing with columnar data, each field has a starting position, an ending position, and a length. As in Chapter 3, you can specify these values for each field, or you can specify some values and allow SQL*Loader to calculate others. For each field, you can choose to specify one of the following combinations of values:
  • Starting and ending position
  • Starting position and length
  • Field type and length
Typically it's best to be consistent and use one approach for all fields. This is especially true when you are supplying the field type and length for each field. When you do that, you're using relative positioning—the beginning position of each field depends on the ending position of the previous field, and it rarely makes sense to mix relative positioning with absolute positioning.
We are going to show some examples of each method, but before we do that, let's look at some representative columnar data:
MI Tie Hill Lookout Tower                            tower     Alger
MI Slowfoot Lake                                     lake      Alger
MI Stella, Lake                                  790 lake      Alger
MI Dam Lake                                          lake      Alger
MI Camp K Lake                                       lake      Alger
MI Tie Lake                                      790 lake      Alger
MI Mathias Township Park                         862 park      Alger
MI Trenary Cemetery                              878 cemetery  Alger
MI Temple Lake                                       lake      Alger
MI Mantila Camp                                  968 locale    Alger
MI Forest Lake Lookout Tower                    1036 tower     Alger
...
This is a list of geographic feature names for Alger County, Michigan. The columns are as follows:
  • State code
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Handling Anomalous Data
When you're dealing with columnar data, there are some anomalies and ambiguous situations that you need to deal with. Take a close look at the following data:
MI Sitka Lake                                    902 lake      Alger
MI Sevenmile Lake                                    lake
MI Little Beaver Creek                               stream    Alger
MI Traunik                                       925 ppl
Now, given this data, ask yourself these questions:
  • Should the missing elevation values be represented in the database as zeros or as nulls?
  • What should be done with records, such as those missing the county name, that are too short?
  • What should be done about trailing spaces in fields? Is it "Traunik" (with no trailing spaces) or "Traunik . . ." (with 37 trailing spaces to fill out the 44-character string)?
The answers to these questions are not always as obvious as you might think. While you almost certainly want a missing elevation to be represented as a null, what if you are dealing with a missing dollar amount? Should that become a null or a zero? Application considerations sometimes trump logic. Representing missing elevations as zero may not make intuitive sense, but you may be dealing with an application that requires missing numbers to be represented as zeros. With character fields, you have the issue of trailing spaces. You would almost certainly trim trailing spaces from feature names, but what about code fields? We've seen programs that expect alphanumeric fields such as the feature_type field to be constant-width, even if that means including trailing spaces.
SQL*Loader has default behavior with respect to each one of these issues, and that default behavior is often reasonable and sufficient. However, SQL*Loader provides you with the flexibility to override the default behavior when necessary.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Concatenating Records
SQL*Loader has the capability to read multiple records from a data file and concatenate those records into one longer record. When you do this, the distinction between physical and logical records becomes important. The term physical record is always used to refer to the records read from the input data file. The term logical record refers to a record that SQL*Loader loads into the database. Usually, there's a one-to-one correspondence between physical and logical records. When you concatenate records together, you are combining two or more physical records into one logical record. SQL*Loader supports two main ways of doing this:
  • You can specify a fixed number of physical records to concatenate into one logical record.
  • You can specify a continuation flag in the physical record that indicates whether or not a given physical record should be concatenated to another.
The distinction between physical and logical records is important when it comes to specifying the positions of fields (using the POSITION clause) in the input record. The POSITION clause always refers to byte positions in the logical record. In general, SQL*Loader always deals with the logical record. The only time you specify positions with respect to the physical record is when you are defining continuation fields.
Use the CONCATENATE clause if you always want to concatenate a specific number of physical records into one logical record. Consider the following data, which splits the information for each geographical feature across two physical records:
MI Halfmoon Lake                                 798
lake      Alger
MI Cleveland Cliffs Basin                        779
reservoir Alger
MI Doe Lake                                      798
lake      Alger
MI Limestone                                     912
ppl       Alger
MI Traunik                                       925
ppl       Alger
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Nesting Delimited Fields
With columnar data, it's possible to take an individual field and treat it as if it were a series of delimited fields. In essence, depending on the representation used, you may be able to extract subfields. There are also some ways to deal with records that begin with fixed-width columns only to have a variable-length column thrown into the middle.
If you've ever worked on a MUMPS system, you are familiar with the concept of nested delimiters, and of using the $PIECE function to extract specific pieces of information from a delimited string. The following columnar data is a doctored version of the feature name data that you've seen so far in this chapter. It's not from a MUMPS system, but the records do contain one column with nested, delimited data:
MI Werners Creek           "46° 11' 07'' N,86° 58' 48'' W " stream Alger
MI Chapel Falls            "46° 14' 44'' N,86° 26' 40'' W " falls  Alger
MI Chapel Lake             "46° 32' N,86° 26' 40'' W      " lake   Alger
MI Birch Lakes             "46° 33' 13'' N,86° 9' 26'' W  " lake   Alger
MI Chapel Beach Campground "46° 32' 51'' N,86° 26' 29'' W " locale Alger
MI Legion Lake             "46° 31' 41'' N,186° 21' 47'' W" lake   Alger
MI Chapel Creek            "46° 32' 54'' N,86° 26' 21'' W " stream Alger
MI Nita, Lake              "46° 33' N,86° 3' 53'' W       " lake   Alger
MI West Branch Lakes       "46° 30' 47'' N,86° 6' 04'' W  " lake   Alger
MI Centerline Lake         "46° 30' 23'' N,86° 3' 50'' W  " lake   Alger
In this example, the latitude and longitude values for each feature are enclosed in a 30-character wide quoted string. A comma ( , ) serves to separate the two values, and the conventional degree, minute, and second notation is used. For example:
degree° minute' second''
The next two examples show you how to do two things. First, you'll see how to store the latitude and longitude into separate database columns. Then you'll see how to extract each degree, minute, and second value individually.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: Loading Delimited Data
Delimited data is the second of the two major classifications of data that you can load using SQL*Loader. Delimited data is usually textual in nature. Fields are separated by strings of characters referred to as delimiters. Issues you'll encounter when loading delimited data include the following:
  • Some fields, notably text fields, may be enclosed by characters such as quotation marks.
  • Fields may contain leading and/or trailing spaces that need to be trimmed away.
  • Records may not have all the fields that the LOAD statement calls for.
This chapter shows you how to use SQL*Loader's support for delimited data. You'll learn how to define delimiters and enclosing characters, and also to deal with nested occurrences of both delimited and fixed-width fields.
Delimited data is almost always textual. Thus, the SQL*Loader datatypes that you'll most commonly use are the external types:
CHAR
DATE
DECIMAL EXTERNAL
ZONED
ZONED EXTERNAL
INTEGER EXTERNAL
FLOAT EXTERNAL
The example data used in this chapter will be a delimited version of the same geographic feature name data that you saw in Chapter 4. The exact data format will vary with the particular aspect of SQL*Loader under discussion at any given time.
The destination table used for all examples can be created using the following statement:
CREATE TABLE michigan_features (
   feature_name VARCHAR2(44),
   elevation NUMBER,
   population NUMBER,
   feature_type VARCHAR2(10),
   county VARCHAR2(15),
   short_feature_name VARCHAR2(10),
   latitude VARCHAR2(15),
   longitude VARCHAR2(15),
   update_time DATE
);
This is very similar to what you saw in the previous chapter. The only differences are that we've added an update timestamp and a column to record the population.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Common Datatypes Encountered
Delimited data is almost always textual. Thus, the SQL*Loader datatypes that you'll most commonly use are the external types:
CHAR
DATE
DECIMAL EXTERNAL
ZONED
ZONED EXTERNAL
INTEGER EXTERNAL
FLOAT EXTERNAL
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Example Data
The example data used in this chapter will be a delimited version of the same geographic feature name data that you saw in Chapter 4. The exact data format will vary with the particular aspect of SQL*Loader under discussion at any given time.
The destination table used for all examples can be created using the following statement:
CREATE TABLE michigan_features (
   feature_name VARCHAR2(44),
   elevation NUMBER,
   population NUMBER,
   feature_type VARCHAR2(10),
   county VARCHAR2(15),
   short_feature_name VARCHAR2(10),
   latitude VARCHAR2(15),
   longitude VARCHAR2(15),
   update_time DATE
);
This is very similar to what you saw in the previous chapter. The only differences are that we've added an update timestamp and a column to record the population.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using Delimiters to Identify Fields
A delimiter is a character, or string of characters, used to separate two fields. One of the most common delimiters is the comma. Spreadsheet programs such as Microsoft Excel commonly generate comma-delimited files. The following is an example of comma-delimited data:
MI,Alger School,school,Kent,425517N,0853840W
MI,Alger Swamp,swamp,Ogemaw,442306N,0840345W
MI,Algoma Cemetery,cemetery,Kent,430923N,0853731W
MI,Algoma Church,church,Kent,430920N,0853732W
One issue that you run into with comma-delimited data is the occasional need to represent the delimiter character in one of the values. This is rarely an issue with numeric fields, but is almost always an issue with text fields. Look at the following example:
MI,Algoma, Township of,civil,Kent,430942N,0853635W
This line contains seven fields delimited by commas, but it was really intended to contain only six. The geographic feature name in this case is "Algoma, Township of ", and the comma in the name makes that one field appear as if it were two. There are two ways of dealing with a problem like this, at least as far as SQL*Loader is concerned:
  • Use a delimiter character that never appears as part of a value.
  • Enclose all string values within quotes.
Choosing a different delimiter is a simple solution. If you know that your field values might contain commas, but are certain they will never contain vertical bars (|), you might represent your data as follows:
MI|Algoma, Township of|civil|Kent|430942N|0853635W
We don't particularly like this solution, because it's too easy to get in trouble down the road. Sure, you don't use vertical bars today, but can you guarantee that you will never use one? If your input programs allow users to enter a vertical bar, you can almost count on it happening some day.
A more robust approach is to enclose each field within quotation marks. For example:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Common Issues with Delimited Data
Content preview·Buy PDF of this chapter|