Book description
An unparalleled collection of recommended guidelines for data warehousing and business intelligence pioneered by Ralph Kimball and his team of colleagues from the Kimball Group.
Recognized and respected throughout the world as the most influential leaders in the data warehousing industry, Ralph Kimball and the Kimball Group have written articles covering more than 250 topics that define the field of data warehousing. For the first time, the Kimball Group's incomparable advice, design tips, and best practices have been gathered in this remarkable collection of articles, which spans a decade of data warehousing innovation.
Each group of articles is introduced with original commentaries that explain their role in the overall lifecycle methodology developed by the Kimball Group. These practical, hands-on articles are fully updated to reflect current practices and terminology and cover the complete lifecycle—including project planning, requirements gathering, dimensional modeling, ETL, and business intelligence and analytics.
This easily referenced collection is nothing less than vital if you are involved with data warehousing or business intelligence in any capacity.
Table of contents
- Copyright
- About the Authors
- Credits
- Acknowledgments
- Introduction
-
1. The Reader at a Glance
- 1.1. Setting Up for Success
-
1.2. Tackling DW/BI Design and Development
- 1.2.1. 1.3 Data Wrangling
- 1.2.2. 1.4 Myth Busters
- 1.2.3. 1.5 Dividing the World
-
1.2.4. 1.6 Essential Steps for the Integrated Enterprise Data Warehouse
- 1.2.4.1. What Does an Integrated EDW Deliver?
- 1.2.4.2. Ultimate Litmus Test for Integration
- 1.2.4.3. Organizational Challenges
- 1.2.4.4. Conformed Dimensions and Facts
- 1.2.4.5. Using the Bus Matrix to Communicate with Executives
- 1.2.4.6. Managing the Integrated EDW Backbone
- 1.2.4.7. The Dimension Manager
- 1.2.4.8. The Fact Provider
- 1.2.4.9. Configuring Business Intelligence (BI) Tools
- 1.2.4.10. Joint Responsibilities
- 1.2.5. 1.7 Drill Down to Ask Why
- 1.2.6. 1.8 Slowly Changing Dimensions
- 1.2.7. 1.9 Judge Your BI Tool through Your Dimensions
- 1.2.8. 1.10 Fact Tables
- 1.2.9. 1.11 Exploit Your Fact Tables
-
2. Before You Dive In
-
2.1. Historical Perspective
- 2.1.1. 2.1 The Database Market Splits
-
2.1.2. 2.2 Bringing Up Supermarts
- 2.1.2.1. The Planning Crisis
- 2.1.2.2. Data Marts with an Architecture
- 2.1.2.3. Importance of Conformed Dimensions
- 2.1.2.4. Designing Conformed Dimensions
- 2.1.2.5. Taking the Pledge
- 2.1.2.6. Permissible Variations of Conformed Dimensions
- 2.1.2.7. Establishing Standard Fact Definitions
- 2.1.2.8. Importance of Granularity
- 2.1.2.9. Higher Level Data Marts
- 2.1.2.10. Rescuing Stovepipes
- 2.1.2.11. When You Don't Need Conformed Dimensions
- 2.1.2.12. Clear Vision
- 2.2. Dealing with Demanding Realities
-
2.1. Historical Perspective
-
3. Project/Program Planning
- 3.1. Professional Responsibilities
- 3.2. Justification and Sponsorship
-
3.3. Kimball Methodology
- 3.3.1. 3.10 Kimball Lifecycle in a Nutshell
- 3.3.2. 3.11 Off the Bench
- 3.3.3. 3.12 The Anti-Architect
- 3.3.4. 3.13 Think Critically When Applying Best Practices
-
3.3.5. 3.14 Eight Guidelines for Low Risk Enterprise Data Warehousing
- 3.3.5.1. Work on the Right Thing
- 3.3.5.2. Give Business Users Control
- 3.3.5.3. Proceed Incrementally
- 3.3.5.4. Start with Lightweight, Focused Governance
- 3.3.5.5. Build a Simple, Universal Platform
- 3.3.5.6. Integrate Using Conformed Dimensions
- 3.3.5.7. Manage Quality a Few Screens at a Time
- 3.3.5.8. Use Surrogate Keys Throughout
- 3.3.6. 3.15 Relating to Agile Methodologies
- 3.3.7. 3.16 Is Agile Enterprise Data Warehousing an Oxymoron?
-
4. Requirements Definition
- 4.1. Gathering Requirements
- 4.2. Organizing around Business Processes
- 4.3. Wrapping Up the Requirements
-
5. Data Architecture
- 5.1. Making the Case for Dimensional Modeling
- 5.2. Enterprise Data Warehouse Bus Architecture
- 5.3. Integration Instead of Centralization
- 5.4. Contrast with the Corporate Information Factory
-
6. Dimensional Modeling Fundamentals
-
6.1. Basics of Dimensional Modeling
- 6.1.1. 6.1 Fact Tables and Dimension Tables
- 6.1.2. 6.2 Drilling Down, Up, and Across
- 6.1.3. 6.3 The Soul of the Data Warehouse, Part One: Drilling Down
- 6.1.4. 6.4 The Soul of the Data Warehouse, Part Two: Drilling Across
- 6.1.5. 6.5 The Soul of the Data Warehouse, Part Three: Handling Time
- 6.1.6. 6.6 Graceful Modifications to Existing Fact and Dimension Tables
- 6.2. Dos and Don'ts
- 6.3. Myths about Dimensional Modeling
-
6.1. Basics of Dimensional Modeling
-
7. Dimensional Modeling Tasks and Responsibilities
- 7.1. Design Activities
-
7.2. Design Review Activities
-
7.2.1. 7.6 Fistful of Flaws
- 7.2.1.1. What's the Grain?
- 7.2.1.2. Mixed Grain or Textual Facts?
- 7.2.1.3. Dimension Descriptors and Decodes?
- 7.2.1.4. Handling of Hierarchies?
- 7.2.1.5. Explicit Date Dimension?
- 7.2.1.6. Control Numbers as Degenerate Dimensions?
- 7.2.1.7. Surrogate Keys?
- 7.2.1.8. Slowly Changing Dimension Strategies?
- 7.2.1.9. Well Understood Business Requirements?
- 7.2.2. 7.7 Rating Your Dimensional Data Warehouse
-
7.2.1. 7.6 Fistful of Flaws
-
8. Fact Table Core Concepts
- 8.1. Granularity
-
8.2. Types of Fact Tables
- 8.2.1. 8.5 Fundamental Grains
- 8.2.2. 8.6 Modeling a Pipeline with an Accumulating Snapshot
- 8.2.3. 8.7 Combining Periodic and Accumulating Snapshots
- 8.2.4. 8.8 Modeling Time Spans
- 8.2.5. 8.9 A Rolling Prediction of the Future, Now and in the Past
- 8.2.6. 8.10 Factless Fact Tables
- 8.2.7. 8.11 Factless Fact Tables? Sound Like Jumbo Shrimp?
- 8.2.8. 8.12 What Didn't Happen
- 8.3. Parent-Child Fact Tables
- 8.4. Fact Table Keys and Degenerates
-
8.5. Miscellaneous Fact Table Design Patterns
- 8.5.1. 8.19 Put Your Fact Tables on a Diet
- 8.5.2. 8.20 Keeping Text Out of the Fact Table
- 8.5.3. 8.21 Dealing with Nulls in a Dimensional Model
- 8.5.4. 8.22 Modeling Data as Both a Fact and Dimension Attribute
- 8.5.5. 8.23 When a Fact Table Can Be Used as a Dimension Table
- 8.5.6. 8.24 Sparse Facts and Facts with Short Lifetimes
- 8.5.7. 8.25 Pivoting the Fact Table with a Fact Dimension
-
9. Dimension Table Core Concepts
- 9.1. Dimension Table Keys
- 9.2. Date and Time Dimension Considerations
-
9.3. Miscellaneous Dimension Patterns
- 9.3.1. 9.8 Data Warehouse Role Models
- 9.3.2. 9.9 Mystery Dimensions
- 9.3.3. 9.10 De-Clutter with Junk Dimensions
- 9.3.4. 9.11 Showing the Correlation Between Dimensions
- 9.3.5. 9.12 Causal (Not Casual) Dimensions
- 9.3.6. 9.13 Resist Abstract Generic Dimensions
- 9.3.7. 9.14 Hot-Swappable Dimensions
- 9.3.8. 9.15 Accurate Counting with a Dimensional Supplement
-
9.4. Slowly Changing Dimensions
- 9.4.1. 9.16 Perfectly Partitioning History with Type 2 SCD
- 9.4.2. 9.17 Many Alternate Realities
- 9.4.3. 9.18 Monster Dimensions
- 9.4.4. 9.19 When a Slowly Changing Dimension Speeds Up
- 9.4.5. 9.20 When Do Dimensions Become Dangerous?
- 9.4.6. 9.21 Slowly Changing Dimensions Are Not Always as Easy as 1, 2, and 3
- 9.4.7. 9.22 Dimension Row Change Reason Attributes
-
10. More Dimension Patterns and Case Studies
- 10.1. Snowflakes, Outriggers, and Bridges
-
10.2. Dealing with Hierarchies
- 10.2.1. 10.6 Maintaining Dimension Hierarchies
- 10.2.2. 10.7 Help for Hierarchies
-
10.2.3. 10.8 Five Alternatives for Better Employee Dimensional Modeling
- 10.2.3.1. Alternative 1: Bridge Table Using Surrogate Keys
- 10.2.3.2. Alternative 2: Bridge Table with Separate Reports-To Dimension
- 10.2.3.3. Alternative 3: Bridge Table with Natural Keys
- 10.2.3.4. Alternative 4: Forced Fixed-Depth Hierarchy Technique
- 10.2.3.5. Alternative 5: The PathString Attribute
- 10.2.3.6. Recommendation
- 10.2.4. 10.9 Alternate Hierarchies
- 10.3. Customer Issues
- 10.4. Addresses and International Issues
-
10.5. Industry Scenarios and Idiosyncrasies
- 10.5.1. 10.17 An Insurance Data Warehouse Case Study
- 10.5.2. 10.18 Traveling through Databases
- 10.5.3. 10.19 Human Resources Dimensional Models
- 10.5.4. 10.20 Not So Fast
- 10.5.5. 10.21 The Budgeting Chain
- 10.5.6. 10.22 Compliance-Enabled Data Warehouses
- 10.5.7. 10.23 Clicking with Your Customer
- 10.5.8. 10.24 The Special Dimensions of the Clickstream
- 10.5.9. 10.25 Fact Tables for Text Document Searching
- 10.5.10. 10.26 Enabling Market Basket Analysis
-
11. Back Room ETL and Data Quality
-
11.1. Planning the ETL System
-
11.1.1. 11.1 Surrounding the ETL Requirements
- 11.1.1.1. Business Needs
- 11.1.1.2. Compliance
- 11.1.1.3. Data Quality via Data Profiling
- 11.1.1.4. Security
- 11.1.1.5. Data Integration and the 360 Degree View
- 11.1.1.6. Data Latency
- 11.1.1.7. Archiving and Lineage
- 11.1.1.8. BI User Delivery Interfaces
- 11.1.1.9. Available Skills
- 11.1.1.10. Legacy Licenses
- 11.1.2. 11.2 The 34 Subsystems of ETL
-
11.1.3. 11.3 Doing the Work at Extract Time
- 11.1.3.1. Modeling Events across Multiple Time Zones
- 11.1.3.2. Verbose Calendar Dimensions
- 11.1.3.3. Keeping the Books across Multiple Currencies
- 11.1.3.4. Product Pipeline Measurements
- 11.1.3.5. Physical Completeness of the Profit and Loss
- 11.1.3.6. Heterogeneous Products
- 11.1.3.7. Aggregations in General
- 11.1.3.8. Dimensional Modeling in General
-
11.1.4. 11.4 Is Data Staging Relational?
- 11.1.4.1. Dimension Processing
- 11.1.4.2. Deciding What Has Changed
- 11.1.4.3. Combining from Separate Sources
- 11.1.4.4. Data Cleaning
- 11.1.4.5. Processing Names and Addresses
- 11.1.4.6. Validating One-to-One and One-to-Many Relationships
- 11.1.4.7. Fact Processing
- 11.1.4.8. Aggregate Processing
- 11.1.4.9. The Bottom Line: Is Data Staging Relational?
- 11.1.5. 11.5 Staging Areas and ETL Tools
- 11.1.6. 11.6 Should You Use an ETL Tool?
- 11.1.7. 11.7 Document the ETL System
- 11.1.8. 11.8 Measure Twice, Cut Once
- 11.1.9. 11.9 Brace for Incoming
- 11.1.10. 11.10 Building a Change Data Capture System
-
11.1.1. 11.1 Surrounding the ETL Requirements
-
11.2. Data Quality Considerations
- 11.2.1. 11.11 Dealing with Dirty Data
- 11.2.2. 11.12 An Architecture for Data Quality
- 11.2.3. 11.13 Indicators of Quality
- 11.2.4. 11.14 Is Your Data Correct?
-
11.2.5. 11.15 Eight Recommendations for International Data Quality
- 11.2.5.1. Languages and Character Sets
- 11.2.5.2. Cultures, Names, and Salutations
- 11.2.5.3. Geographies and Addresses
- 11.2.5.4. Privacy and Information Transfer
- 11.2.5.5. International Compliance
- 11.2.5.6. Currencies
- 11.2.5.7. Time Zones, Calendars, and Date Formats
- 11.2.5.8. Numbers
- 11.2.5.9. Architectures for International Data Quality
- 11.2.6. 11.16 Using Regular Expressions for Data Cleaning
-
11.3. Populating Fact and Dimension Tables
- 11.3.1. 11.17 Pipelining Your Surrogates
- 11.3.2. 11.18 Replicating Dimensions Correctly
- 11.3.3. 11.19 Identify Dimension Changes Using Cyclic Redundancy Checksums
- 11.3.4. 11.20 Maintaining Back Pointers to Operational Sources
- 11.3.5. 11.21 Creating Historical Dimension Rows
- 11.3.6. 11.22 Backward in Time
- 11.3.7. 11.23 Early-Arriving Facts
- 11.3.8. 11.24 Slowly Changing Entities
- 11.3.9. 11.25 Creating, Using, and Maintaining Junk Dimensions
- 11.3.10. 11.26 Using the SQL MERGE for Slowly Changing Dimensions
- 11.3.11. 11.27 Being Offline as Little as Possible
- 11.4. Supporting Real Time
-
11.1. Planning the ETL System
-
12. Technical Architecture Considerations
-
12.1. Overall Technical/System Architecture
- 12.1.1. 12.1 Can the Data Warehouse Benefit from SOA?
- 12.1.2. 12.2 Picking the Right Approach to MDM
- 12.1.3. 12.3 Building Custom Tools for the DW/BI System
- 12.1.4. 12.4 Welcoming the Packaged App
- 12.1.5. 12.5 ERP Vendors: Bring Down Those Walls
- 12.1.6. 12.6 Building a Foundation for Smart Applications
- 12.1.7. 12.7 RFID Tags and Smart Dust
-
12.2. Presentation Server Architecture
- 12.2.1. 12.8 The Aggregate Navigator
- 12.2.2. 12.9 Aggregate Navigation with (Almost) No Metadata
- 12.2.3. 12.10 Relating to OLAP
- 12.2.4. 12.11 Dimensional Relational versus OLAP: The Final Deployment Conundrum
- 12.2.5. 12.12 Dimensional Modeling for Microsoft Analysis Services
- 12.2.6. 12.13 Architecting Your Data for Microsoft SQL Server 2005
- 12.2.7. 12.14 Microsoft SQL Server Comes of Age for Data Warehousing
- 12.3. Front Room Architecture
- 12.4. Metadata
- 12.5. Infrastructure and Security Considerations
-
12.1. Overall Technical/System Architecture
-
13. Front Room Business Intelligence Applications
- 13.1. Delivering Value with Business Intelligence
- 13.2. Implementing the Business Intelligence Layer
- 13.3. Mining Data to Uncover Relationships
- 13.4. Dealing with SQL
-
14. Maintenance and Growth Considerations
- 14.1. Deploying Successfully
-
14.2. Sustaining for Ongoing Impact
- 14.2.1. 14.5 Data Warehouse Checkups
- 14.2.2. 14.6 Boosting Business Acceptance
- 14.2.3. 14.7 Educate Management to Sustain DW/BI Success
- 14.2.4. 14.8 Getting Your Data Warehouse Back on Track
- 14.2.5. 14.9 Upgrading Your BI Architecture
- 14.2.6. 14.10 Four Fixes for Legacy Data Warehouses
- 14.2.7. 14.11 A Data Warehousing Fitness Program for Lean Times
- Article Index
Product information
- Title: The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence
- Author(s):
- Release date: February 2010
- Publisher(s): Wiley
- ISBN: 9780470563106
You might also like
book
Business Intelligence Guidebook
Between the high-level concepts of business intelligence and the nitty-gritty instructions for using vendors’ tools lies …
book
Data Warehousing Fundamentals for IT Professionals
Cutting-edge content and guidance from a data warehousing expert—now expanded to reflect field trends Data warehousing …
book
Data Management at Scale
As data management and integration continue to evolve rapidly, storing all your data in one place, …
book
Agile Data Warehousing for the Enterprise
Building upon his earlier book that detailed agile data warehousing programming techniques for the Scrum master, …