The Enterprise Data Catalog

Book description

Combing the web is simple, but how do you search for data at work? It's difficult and time-consuming, and can sometimes seem impossible. This book introduces a practical solution: the data catalog. Data analysts, data scientists, and data engineers will learn how to create true data discovery in their organizations, making the catalog a key enabler for data-driven innovation and data governance.

Author Ole Olesen-Bagneux explains the benefits of implementing a data catalog. You'll learn how to organize data for your catalog, search for what you need, and manage data within the catalog. Written from a data management perspective and from a library and information science perspective, this book helps you:

  • Learn what a data catalog is and how it can help your organization
  • Organize data and its sources into domains and describe them with metadata
  • Search data using very simple-to-complex search techniques and learn to browse in domains, data lineage, and graphs
  • Manage the data in your company via a data catalog
  • Implement a data catalog in a way that exactly matches the strategic priorities of your organization
  • Understand what the future has in store for data catalogs

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Who Should Read This Book
    2. Navigating This Book
    3. Conventions Used in This Book
    4. O’Reilly Online Learning
    5. How to Contact Us
    6. Acknowledgments
  3. I. Organizing Data So You Can Search for It
  4. 1. Introduction to Data Catalogs
    1. The Core Functionality of a Data Catalog
      1. Create an Overview of the IT Landscape
      2. Organize Data
      3. Enable Search of Company Data
    2. Data Discovery
    3. The Data Discovery Team
      1. Data Architects
      2. Data Engineers
      3. Data Discovery Team Setup
    4. End-User Roles and Responsibilities
    5. Summary
  5. 2. Organize Data: Design a Robust Architecture for Search
    1. Organizing Domains in the Data Catalog
      1. Domain Architecture in a Data Catalog
      2. Understanding Domains
      3. Processes and Capabilities
      4. Data Sources
    2. Getting Assets into the Data Catalog
      1. Pull
      2. Push
    3. Organizing Assets in the Domains
      1. Asset Metadata
      2. Metadata Quality
    4. Classification
    5. Summary
  6. 3. Understand Search: Concepts, Features, and Mechanics
    1. Why Do You Search in a Data Catalog?
    2. Search Features in a Data Catalog
    3. Searching in Data Versus Searching for Data
    4. How Do You Search a Data Catalog?
      1. Data Catalog Query Language
      2. The Search Features in a Data Catalog Explained
      3. Searching for Everything?
    5. The Mechanics of Search
      1. Recall and Precision
      2. Zipf’s Law
      3. Serendipity
    6. Summary
  7. 4. Apply Search: From Simple to Advanced Patterns
    1. Search Like Librarians—Not Like Data Scientists
    2. Search Patterns
      1. Basic Simple Search
      2. Detailed Simple Search
      3. Flexible Simple Search
      4. Range Search
      5. Block Search
      6. Statement Search
    3. Browsing Patterns
      1. Glossary Browsing
      2. Domain Browsing
      3. Lineage Browsing
      4. Graph Browsing
    4. Searching a Graph-Based Data Catalog
    5. Summary
  8. II. Democratizing Data with a Data Catalog
  9. 5. Discover Data: Empower End Users and Engage Stakeholders
    1. A Data Catalog Is a Social Network
    2. Active Metadata
    3. Ensure Stakeholder Engagement
      1. Engage Data Governance Leaders
      2. Engage Data Analytics Leaders
      3. Engage Domain Leaders
    4. Seeing All Data Through One Lens
    5. The Operational Backbone and the Data Platform
    6. Summary
  10. 6. Access Data: The Keys to Successful Implementation
    1. Choosing a Data Catalog
      1. Vendor Analysis
      2. Some Key Vendors
      3. Catalog of Catalogs
    2. How to Access Data
      1. Data Providers and Data Consumers
      2. Centralized Approach
      3. Decentralized Approach
      4. Combined Approach
    3. Building Domains
      1. Questionnaire No. 1: Domain Owner Description of Domain and Assets
      2. Questionnaire No. 2: Asset Steward Description of Assets in the Domain
      3. Questionnaire No. 3: Asset Steward Description of the Glossary Terms of Their Assets
    4. Summary
  11. 7. Manage Data: Improve Lifecycle Management
    1. The Value of Data Lifecycle Management and Why the Data Catalog Is a Game Changer
    2. Various Lifecycles
      1. Data Lifecycle
      2. Using the Data Catalog for Data Lifecycle Management
      3. The Data Asset Lifecycle in the Data Catalog
      4. Glossary Term Lifecycle
      5. Data Source Lifecycle
      6. Lifecycle Influence and Support
    3. Applied Search Based on Lifecycles
    4. Applied Search for Regulatory Compliance
    5. Maintenance Best Practices
      1. Maintenance of the Data Outside the Data Catalog
      2. Maintenance of Metadata Inside the Data Catalog
    6. Improved Data Lifecycle Management
    7. Summary
  12. III. Envisioning the Future of Data Catalogs
  13. 8. Looking Ahead: The Company Search Engine and Improved Data Management
    1. The Company Search Engine
    2. The Company Search Engine in Hugin & Munin
    3. From Data to Knowledge
    4. A Medium Theoretical Take on the Company Search Engine
      1. Is the Company Search Engine New?
      2. Will the Company Search Engine Become Reality?
    5. Summary
  14. Afterword
    1. Consider Implementing a Data Catalog
    2. Follow Me
  15. Appendix. Data Catalog Query Language
  16. Index
  17. About the Author

Product information

  • Title: The Enterprise Data Catalog
  • Author(s): Ole Olesen-Bagneux
  • Release date: February 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492098713