Advanced SQL with SAS

Book description

This book introduces advanced techniques for using PROC SQL in SAS. If you are a SAS programmer, analyst, or student who has mastered the basics of working with SQL, Advanced SQL with SAS® will help take your skills to the next level. Filled with practical examples with detailed explanations, this book demonstrates how to improve performance and speed for large data sets. Although the book addresses advanced topics, it is designed to progress from the simple and manageable to the complex and sophisticated.

In addition to numerous tuning techniques, this book also touches on implicit and explicit pass-throughs, presents alternative SAS grid- and cloud-based processing environments, and compares SAS programming languages and approaches including FedSQL, CAS, DS2, and hash programming. Other topics include:

  • Missing values and data quality with audit trails
  • “Blind spots” like how missing values can affect even the simplest calculations and table joins
  • SAS macro language and SAS macro programs
  • SAS functions
  • Integrity constraints
  • SAS Dictionaries
  • SAS Compute Server

Table of contents

  1. Preface
    1. About This Book
    2. What Is SAS, for Me?
    3. Acknowledgments
    4. Acknowledgments for the 2012 Edition
  2. About This Book
    1. What Does This Book Cover?
    2. Is This Book for You?
    3. What Should You Know about the Examples?
      1. Software Used to Develop the Book’s Content
      2. Example Code and Data
      3. SAS OnDemand for Academics
    4. We Want to Hear from You
    5. Author Feedback
  3. About The Author
  4. Chapter 1: Overview
    1. 1.1 Detailed Description of This Book
    2. 1.2 Which SAS Language for the CAS Environment?
  5. Chapter 2: Missing Values
    1. 2.1 From the Start: Defining Missing Values
      1. 2.1.1 Missing Values in Numeric Variables
      2. 2.1.2 Missing Values in String Variables
      3. 2.1.3 Example: Test Data for Handling Missing Values
      4. 2.1.4 Defining Missing Entries when Creating Empty Tables
      5. 2.1.5 SQL Approach: Creating an Empty Table Including Integrity Constraints for Missing Values
      6. 2.1.6 DATA Step Approach
    2. 2.2 Queries for Missing Values
      1. 2.2.1 Query Missing Values in a Numeric Column
      2. 2.2.2 Exclusion of Missing Values from Two Numerical Columns
      3. 2.2.3 Query of User-defined Missing Values
      4. 2.2.4 Query of Missing Values in a String Column
      5. 2.2.5 Variables, Accesses, and Possibly Undesired Results of Queries for Missing Values
    3. 2.3 Missing Values in Aggregating Functions
      1. 2.3.1 Counting Existing, Duplicate, and Missing Values with COUNT
      2. 2.3.2 Variations in Aggregating Functions
      3. 2.3.3 Adjusting an Aggregating Function Using CASE
      4. 2.3.4 Final Remarks on DISTINCT
    4. 2.4 Possibly Undesirable Results with WHERE, GROUP, and ORDER
      1. 2.4.1 Unwanted Results with WHERE
      2. 2.4.2 Undesired Results when Sorting Data
      3. 2.4.3 Undesirable Results when Grouping Data
      4. 2.4.4 Undesired Results when Joining Tables
    5. 2.5 Possibly Undesirable Results with Joins
      1. 2.5.1 Examples I: Self-Joins on a Single Table
      2. 2.5.2 Examples II: Missing Values in Multiple Tables
    6. 2.6 Searching and Replacing Missing Values
      1. 2.6.1 Searching for Missing Values (Screening)
      2. 2.6.2 Searching and Replacing of Missing Values (Conversion)
    7. 2.7 Predicates
      1. 2.7.1 ALL, ANY, and SOME
      2. 2.7.2 EXISTS or NOT EXISTS
      3. 2.7.3 IN and NOT IN
      4. 2.7.4 IS NULL, IS NOT NULL, IS MISSING, and IS NOT MISSING
      5. 2.7.5 LIKE and NOT LIKE
  6. Chapter 3: Data Quality with PROC SQL
    1. 3.1 Integrity Constraints and Audit Trails
      1. 3.1.1 Integrity Constraints (Check Rules)
      2. 3.1.2 Audit Trails
    2. 3.2 How to Identify and Filter Multiple Values
      1. 3.2.1 Approach 1: Displaying Duplicate IDs (Univariate)
      2. 3.2.2 Approach 2: Filtering of Duplicate Values (HAVING COUNT)
      3. 3.2.3 Approach 3: Finding Duplicate Values (Multivariate)
      4. 3.2.4 Approach 4: Creating Lists for Duplicate Rows (Macro Variable)
      5. 3.2.5 Approach 5: Checking for Duplicate Entries (Macro)
      6. 3.2.6 Approach 6: Identifying Duplicates in Multiple Tables
    3. 3.3 Identifying and Filtering Outliers
      1. 3.3.1 Approach 1: Checking for Outliers Using Descriptive Statistics
      2. 3.3.2 Approach 2: Checking for Outliers Using Statistical Tests (David Test)
      3. 3.3.3 Approach 3: Filtering Outliers Using Conditions
    4. 3.4 Uniformity: Identify, Filter, and Replace Characters
      1. 3.4.1 Approach 1: Checking for Strings in Terms of Longer Character Strings
      2. 3.4.2 Approach 2: Complete, Partially, or Not at All: Checking for Multiple Characters
      3. 3.4.3 Approach 3: Details: Checking for Single Characters
  7. Chapter 4: Macro Programming with PROC SQL
    1. 4.1 Macro Variables
      1. 4.1.1 Automatic SAS Macro Variables
      2. 4.1.2 Automatic SQL Macro Variables
      3. 4.1.3 User-defined SAS Macro Variables (INTO)
      4. 4.1.4 Macro Variables, INTO and Possible Loss of Precision
      5. 4.1.5 The Many Roads Leading to a Macro Variable
    2. 4.2 Macro Programs with PROC SQL
      1. 4.2.1 What Are Macro Programs?
      2. 4.2.2 Let’s Do It: SAS Macros with the %LET Statement
      3. 4.2.3 Listwise Execution of Commands
      4. 4.2.4 Condition-based Execution of Commands
      5. 4.2.5 Tips for the Use of Macros
    3. 4.3 Elements of the SAS Macro Language
      1. 4.3.1 SAS Macro Functions
      2. 4.3.2 SAS Macro Statements
      3. 4.3.3 Interfaces I: From the SAS Macro Facility to the DATA Step
      4. 4.3.4 Interfaces II: From PROC SQL to the SAS Macro Facility (INTO)
    4. 4.4 Application 1: Rowwise Data Update Including Security Check
    5. 4.5 Application 2: Working with Multiple Files (Splitting)
      1. 4.5.1 Splitting a Data Set into Uniformly Filtered Subsets (Split Variable is of Type “String”)
      2. 4.5.2 Splitting a Data Set into Uniformly Filtered Subsets (Split Variable is of Type “Numeric”)
    6. 4.6 Application 3: Transposing a SAS Table (Stack and Unstack)
      1. 4.6.1 Stack
      2. 4.6.2 Unstack
      3. 4.6.3 From Stack to Unstack (From 1 to 3)
      4. 4.6.4 From Unstack to Stack (From 3 to 1)
    7. 4.7 Application 4: Macros to Retrieve System Information
      1. 4.7.1 Query the Contents of SAS tables (VARLIST, VARLIST2)
      2. 4.7.2 Searching a SAS Table for a Specific Column (DO_VAR_EX and DO_VAR_EX2)
      3. 4.7.3 Search in the Dictionary.Options (IS_TERM)
    8. 4.8 Application 5: Creating Folders for Data Storage
      1. 4.8.1 Macro MYSTORAGE and Call
    9. 4.9 Application 6: Consecutive “Exotic” Names for SAS Columns (“2010”, “2011”, ...)
      1. 4.9.1 Sample Output
      2. 4.9.2 Macro EXOTICS
      3. 4.9.3 Explanation
      4. 4.9.4 Sample Call
    10. 4.10 Application 7: Converting Entire Lists of String Variables to Numeric Variables
      1. 4.10.1 Explanation
      2. 4.10.2 Application
      3. 4.10.3 Requesting Outputs Before and After the Conversion (PROC CONTENTS)
      4. 4.10.4 Further Notes on the Six Steps Presented
  8. Chapter 5: SQL for Geodata
    1. 5.1 Geographical Data and Distances
      1. 5.1.1 Distances in Two-Dimensional Space (Basis: Metric Coordinates)
      2. 5.1.2 Distances in Spherical Space (Basis: Longitudes and Latitudes)
      3. 5.1.3 SAS Functions for the Calculation of Geodetic Distances Plus Notes about EUCLID
    2. 5.2 SQL Queries for Coordinates
    3. 5.3 SQL and Maps
      1. 5.3.1 SAS Program (Five Steps)
      2. 5.3.2 Notes and Explanations
  9. Chapter 6: Hash Programming as an Alternative to SQL
    1. 6.1 What Is Hash Programming?
      1. 6.1.1 Hash Programming from Three Angles
      2. 6.1.2 Rowwise Explanation of an Easy Introductory Example
      3. 6.1.3 Step-by-step Variations of the Introductory Example
      4. 6.1.4 Sample Data for Benchmark Tests
    2. 6.2 Working with One Table
      1. 6.2.1 Aggregating with SQL, SUMMARY, and Hash
      2. 6.2.2 Sorting with SORT, SQL, and Hash
      3. 6.2.3 Subsetting: Filtering and Random Sampling
      4. 6.2.4 Eliminating Duplicate Keys
      5. 6.2.5 Querying Values (Retrieval)
    3. 6.3 Working with Multiple Tables
      1. 6.3.1 Inner and Outer Joins with Two Tables
      2. 6.3.2 Fuzzy Join with Two Tables
      3. 6.3.3 Splitting of an Unsorted SAS Table
    4. 6.4 Overview: Elements of Hash Programming
  10. Chapter 7: FedSQL
    1. 7.1 Benefits and Specifics of FedSQL Language
      1. 7.1.1 FedSQL Compared to PROC SQL
      2. 7.1.2 Specifics of FedSQL: SAS versus CAS
      3. 7.1.3 FEDSQL Syntax: Outline and Supported Statements
      4. 7.1.4 The FedSQL Environment: Components of SAS Viya: SAS and CAS
    2. 7.2 Using FEDSQL Syntax in SAS and CAS
      1. 7.2.1 FEDSQL on SAS: Replacing SQL with FEDSQL Successfully
      2. 7.2.2 FEDSQL on SAS: Replacing PROC SQL with FEDSQL with Tuning
      3. 7.2.3 Using FedSQL in CAS: PROC FEDSQL and PROC CAS
      4. 7.2.4 Not Just FEDSQL on CAS: DATA Step and Others
      5. 7.2.5 CAS Procedures: Statistics, Data Mining, and Machine Learning
    3. 7.3 FEDSQL in DS2 Programs
      1. 7.3.1 Building Blocks of DS2
      2. 7.3.2 Applications of DS2 Programs
    4. 7.4 FEDSQL Syntax (Details)
      1. 7.4.1 Outline of FEDSQL Syntax
      2. 7.4.2 Short Overview of FEDSQL Statements
      3. 7.4.3 Short Description of FEDSQL Connection Options
      4. 7.4.4 Short Description of FEDSQL Processing Options
      5. 7.4.5 Attributes for the CONN= Option (Connection Option)
      6. 7.4.6 Parameters for the CNTL= Option (Processing Option)
    5. 7.5 Summary: When to Use FedSQL and When to Use SQL
  11. Chapter 8: Performance and Efficiency
    1. 8.1 Introduction to Performance and Efficiency
      1. 8.1.1 The Price of Performance (Efficiency)
      2. 8.1.2 The Need for Speed: Considerations and Tips for Better Performance
      3. 8.1.3 A Strategy as a SAS Program
    2. 8.2 Less is More: Narrowing Large Tables Down to the Essential
      1. 8.2.1 Reduce Columns: KEEP, DROP, and SELECT Statements
      2. 8.2.2 Reduce Rows: WHERE Statement, Subsetting IF, and Data Set Options
      3. 8.2.3 Reduce Tables by Deleting (DELETE)
      4. 8.2.4 Reduce Structures: Transposing Tables
      5. 8.2.5 Eliminate Unnecessary Characters by Means of Tag Sets
      6. 8.2.6 Aggregate
    3. 8.3 Squeeze Even More Air Out of Data: Shortening and Compression
      1. 8.3.1 Shortening the Variable Length (LENGTH Statement)
      2. 8.3.2 To Compress or Not Compress? COMPRESS Function
    4. 8.4 Sorting? The Fewer the Better
      1. 8.4.1 No Sorting: Do Without ORDER BY
      2. 8.4.2 Minimal Sorting Using CREATE INDEX: Indexing Instead of Sorting
      3. 8.4.3 Minimal Reading (BY Processing)
      4. 8.4.4 Sorting Supported by Hardware (Multi-Threading)
    5. 8.5 Accelerate: Special Tricks for Special Occasions (SQL and More)
      1. 8.5.1 Fundamental Techniques
      2. 8.5.2 SQL-specific Techniques
      3. 8.5.3 Other Techniques
    6. 8.6 Data Processing in SAS or in the DBMS: Tuning of SQL to DBMS
      1. 8.6.1 Transfer of SQL to the DBMS
      2. 8.6.2 Tracing of SQL Passed to the DBMS (SASTRACE)
      3. 8.6.3 PROC SQL Operations that SAS can Pass to the DBMS
      4. 8.6.4 Passing an SQL Statement Directly to a DBMS (DIRECT_EXE)
      5. 8.6.5 More Tips and Tricks for Performance: VIO Method
    7. 8.7 Step by Step to Greater Performance: Performance as a Strategy
      1. 8.7.1 Optimization of I/O
      2. 8.7.2 Optimization of Memory Usage
      3. 8.7.3 Optimization of CPU Performance
      4. 8.7.4 Plan the Program, Program the Plan, Test the Program
      5. 8.7.5 Go Beyond SQL and Programming
    8. 8.8 Beyond SQL and Programming: Environments
      1. 8.8.1 SAS Programming on a SAS Grid
      2. 8.8.2 SAS Cloud Analytic Services (CAS)
  12. Chapter 9: Tips, Tricks, and More
    1. 9.1 Runtime as the Key to Performance
      1. 9.1.1 Overview
      2. 9.1.2 Scenario I: Default Setting: CPU Time versus Real time
      3. 9.1.3 Scenario II: FULLSTIMER Option: CPU time, Real time, and Memory
      4. 9.1.4 Scenario III: Aggregating Calculation of the Runtime (Macro)
      5. 9.1.5 Scenario IV: Differentiated Calculation of the Runtime (ARM Macros)
    2. 9.2 To Be in the Know: SAS Dictionaries
      1. 9.2.1 Query a Dictionary as a Table and View
      2. 9.2.2 Refining the Query of a Dictionary (WHERE)
      3. 9.2.3 Example 1: Finding the Storage Location of Certain Variables (Macro WHR_IS_VAR)
      4. 9.2.4 Example 2: Query the Number of Rows in Specific Tables (Macro N_ROWS)
      5. 9.2.5 Example 3: Renaming Complete Variable Lists Using Dictionaries
    3. 9.3 Data Handling and Data Structuring
      1. 9.3.1 Topic 1: Creating “Exotic” Column Names
      2. 9.3.2 Topic 2: Creating a Primary Key (MONOTONIC and Safer Options)
      3. 9.3.3 Topic 3: Analyze and Structure: Segmenting a SAS Table (MOD Function)
      4. 9.3.4 Topic 4: Defining a Tag Set for the Export of SAS Tables into CSV Format (PROC TEMPLATE)
      5. 9.3.5 Topic 5: Protecting Contents of SAS Tables (Passwords)
    4. 9.4 Updating Tables (SQL versus DATA Step)
      1. 9.4.1 Scenario I: MASTER/UPDATE without Multiple IDs: Problem: System-defined Missing Values  
      2. 9.4.2 Scenario II: MASTER/UPDATE without Multiple IDs: Problem: User-defined Missing Values  
      3. 9.4.3 Scenario III: MASTER/UPDATE with Multiple IDs: Problems with Multiple IDs
  13. Chapter 10: SAS Syntax—PROC SQL, SAS Functions, and SAS CALL Routines
    1. 10.1 PROC SQL Syntax Overview
      1. 10.1.1 Schema of PROC SQL Syntax
      2. 10.1.2 Short Description of PROC SQL Options
      3. 10.1.3 Selected SAS System Options for PROC SQL
    2. 10.2 SAS Functions and CALL Routines (Overview)
      1. 10.2.1 A Small Selection of SAS Functions and SAS CALL Routines
      2. 10.2.2 Quick Finder: Categories of SAS Functions and CALL Routines
      3. 10.2.3 Categories and Descriptions of SAS Functions and SAS CALL Routines
    3. 10.3 Pass-Through Facility (Features)
      1. 10.3.1 Implicit Pass-Through: Options
      2. 10.3.2 Explicit Pass-Through: Statements and Component
      3. 10.3.3 Explicit Pass-Through: Examples for DBMS-specific Features
  14. References
  15. Syntax Index
  16. Subject Index

Product information

  • Title: Advanced SQL with SAS
  • Author(s): Christian FG Schendera
  • Release date: May 2022
  • Publisher(s): SAS Institute
  • ISBN: 9781955977890