Data Modeling for Azure Data Services

Book description

Choose the right Azure data service and correct model design for successful implementation of your data model with the help of this hands-on guide

Key Features

  • Design a cost-effective, performant, and scalable database in Azure
  • Choose and implement the most suitable design for a database
  • Discover how your database can scale with growing data volumes, concurrent users, and query complexity

Book Description

Data is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation.

Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory.

By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.

What you will learn

  • Model relational database using normalization, dimensional, or Data Vault modeling
  • Provision and implement Azure SQL DB and Azure Synapse SQL Pools
  • Discover how to model a Data Lake and implement it using Azure Storage
  • Model a NoSQL database and provision and implement an Azure Cosmos DB
  • Use Azure Data Factory to implement ETL/ELT processes
  • Create a star schema model using dimensional modeling

Who this book is for

This book is for business intelligence developers and consultants who work on (modern) cloud data warehousing and design and implement databases. Beginner-level knowledge of cloud data management is expected.

Table of contents

  1. Data Modeling for Azure Data Services
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share Your Thoughts
  6. Section 1 – Operational/OLTP Databases
  7. Chapter 1: Introduction to Databases
    1. Overview of relational databases
      1. Files
      2. Relational databases
    2. Introduction to Structured Query Language
      1. Different categories of SQL
      2. Understanding the database schema
    3. Impact of intended usage patterns on database design
    4. Understanding relational theory
      1. Pillar 1 – Elements of a set are not ordered
      2. Pillar 2 – All elements in a set are unique
    5. Keys
      1. Types of keys
      2. Choosing the primary key
      3. Integrity
      4. The Check and Unique constraints
    6. Types of workload
      1. OLTP
      2. OLAP
    7. Summary
  8. Chapter 2: Entity Analysis
    1. Scope
      1. Project scope
      2. Product scope
    2. Understanding entity relationship diagrams
    3. Entities
      1. Understanding super- and sub-entities
      2. Naming entities
    4. Relationships
      1. Types of relationships
      2. Drawing conventions
      3. Recap
    5. Creating your first ERD
    6. Context of an ERD
    7. Summary
    8. Exercises
      1. Exercise 1 – student registration
      2. Exercise 2 – airline
  9. Chapter 3: Normalizing Data
    1. When to use normalization as a design strategy
      1. Considering all the details
    2. Preventing redundancy
      1. How to avoid redundancy
    3. The normalization steps
      1. Step zero
      2. First normal form
      3. Second normal form
      4. Third normal form
      5. Boyce-Codd and the fourth normal form
      6. Normalizing – a recap
    4. An alternative approach to normalizing data
      1. Step 1
      2. Step 2
      3. Step 3
      4. Step 4
    5. Integrating separate results
    6. Entity relationship diagram
    7. Summary
    8. Exercises
      1. Exercise 1 – Stock management of a bicycle shop
  10. Chapter 4: Provisioning and Implementing an Azure SQL DB
    1. Technical requirements
    2. Understanding SQL Server data types
      1. Numerical data
      2. Alphanumerical data
      3. Varying-length data types
      4. Dates
      5. Other data types
    3. Quantifying the data model
      1. Estimating the database size
      2. Analyzing expected usage patterns
    4. Provisioning an Azure SQL database
      1. Provisioned versus serverless
      2. vCores versus DTU
      3. Hyperscale and Business Critical
      4. Elastic pool
      5. Networking
      6. Additional settings
      7. Tags
      8. Review + create
    5. Connecting to the database
      1. Azure portal
      2. Azure Data Studio
    6. Data definition language
      1. Creating a table
      2. Altering a table
      3. Dropping a table
    7. Inserting data
    8. Indexing
      1. Clustered index
      2. Nonclustered index
      3. Automatic tuning
    9. Summary
  11. Chapter 5: Designing a NoSQL Database
    1. Understanding big data
    2. Understanding big data clusters
      1. Partitioning
    3. Getting to know Cosmos DB
      1. JSON
      2. Modeling JSON
      3. Using embedding versus referencing
      4. Referring to objects
      5. Cosmos DB partitioning
      6. Putting it together
    4. Key-value databases
      1. Modeling key-value databases
    5. Other NoSQL databases
      1. Gremlin
      2. Cassandra
    6. Extra considerations
      1. Polyglot persistence
      2. Concurrency
    7. Summary
    8. Exercise
  12. Chapter 6: Provisioning and Implementing an Azure Cosmos DB Database
    1. Technical requirements
    2. Provisioning a Cosmos DB database
      1. Basics
      2. Networking
      3. Backup policy
      4. Encryption
    3. Creating a container
    4. Uploading documents to a container
    5. Cosmos DB container settings
    6. Importing data using the Azure Cosmos DB Data Migration tool
    7. Summary
  13. Section 2 – Analytics with a Data Lake and Data Warehouse
  14. Chapter 7: Dimensional Modeling
    1. Background to dimensional modeling
      1. Performance
      2. Consistency
      3. Data quality
      4. The complexity of normalized database schemas
      5. Lack of historical data
    2. Understanding dimensional modeling
      1. Minimizing redundancy
      2. Using dependencies between attributes
      3. Understanding star schemas
      4. Understanding fact tables
      5. Understanding dimension tables
    3. Steps in dimensional modeling
      1. Choosing a process and defining the scope
      2. Determining the needed grain
      3. Determining the dimensions
      4. Determining the facts
    4. Designing dimensions
      1. Defining the primary key of a dimension table
      2. Adding an unknown member
      3. Creating star schemas versus creating snowflake schemas
      4. Implementing a date dimension
      5. Slowly changing dimensions
      6. Junk dimension
      7. Degenerate dimension
    5. Designing fact tables
      1. Understanding additive facts
      2. Understanding semi-additive facts
      3. Understanding non-additive facts
      4. Understanding transactional fact tables
      5. Understanding periodic snapshot fact tables
      6. Understanding accumulating snapshot fact tables
      7. Understanding the roleplaying dimension
      8. Using a coverage fact table
    6. Using a Kimball data warehouse versus data marts
    7. Summary
    8. Exercise
  15. Chapter 8: Provisioning and Implementing an Azure Synapse SQL Pool
    1. Overview of Synapse Analytics
      1. Introducing SQL pools
      2. Introducing Spark pools
      3. Introducing data integration
    2. Provisioning a Synapse Analytics workspace
    3. Creating a dedicated SQL pool
    4. Implementing tables in Synapse SQL pools
      1. Using hash distribution
      2. Using replicated distribution
      3. Using ROUND_ROBIN distribution
      4. Implementing columnstore indexes
    5. Understanding workload management
      1. Creating a workload group
      2. Creating a workload classifier
    6. Using PolyBase to load data
      1. Enabling a SQL pool to access a data lake account
      2. Configuring and using PolyBase
      3. Using CTAS to import data
      4. Using COPY to import data
    7. Connecting to and using a dedicated SQL pool
      1. Working with Azure Data Studio
      2. Working with Power BI
    8. Summary
  16. Chapter 9: Data Vault Modeling
    1. Background to Data Vault modeling
    2. Designing Hub tables
      1. Defining the business key
      2. Implementing a hash key
      3. Adding the load date
      4. Adding the name of the source system
      5. Adding optional columns
    3. Designing Link tables
    4. Designing Satellite tables
      1. Adding optional columns to a Satellite
      2. Choosing the number of Satellites to use
    5. Using hash keys
    6. Designing a Data Vault structure
      1. Choosing the Hubs
      2. Choosing the Links
      3. Choosing the Satellites
    7. Designing business vaults
      1. Adding a Meta Mart
      2. Adding a Metrics Vault
      3. Adding an Error Mart
      4. Using Point-in-Time tables
      5. Adding Bridge tables
      6. Adding a hierarchical link
    8. Implementing a Data Vault
    9. Summary
    10. Exercise
  17. Chapter 10: Designing and Implementing a Data Lake Using Azure Storage
    1. Technical requirements
    2. Background of data lakes
    3. Modeling a data lake
      1. Defining data lake zones
      2. Defining a data lake folder structure
      3. Designing time slices
    4. Using different file formats
      1. AVRO file format
      2. Parquet file format
      3. ORC file format
    5. Choosing the proper file size
    6. Provisioning an Azure storage account
      1. Locally redundant storage (LRS)
      2. Zone-redundant storage (ZRS)
      3. Geo-redundant storage (GRS) and geo-zone-redundant storage (GZRS)
      4. Read-access geo-redundant storage (RA_GRS) and read-access geo-zone-redundant storage (RA_GZRS)
    7. Creating a data lake filesystem
    8. Creating multiple storage accounts
      1. Considering DTAP
      2. Considering data diversity
      3. Considering cost sensitivity
      4. Considering management overhead
    9. Summary
  18. Section 3 – ETL with Azure Data Factory
  19. Chapter 11: Implementing ETL Using Azure Data Factory
    1. Technical requirements
    2. Introducing Azure Data Factory
    3. Introducing the main components of Azure Data Factory
      1. Understanding activities
      2. Understanding datasets
      3. Understanding linked services
      4. Understanding pipelines
      5. Understanding triggers and integration runtimes
    4. Using the copy activity
      1. Copying a single table to the data lake
      2. Copying all tables to the data lake
    5. Implementing a data flow
    6. Executing SQL code from Data Factory
    7. Summary
    8. Why subscribe?
  20. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Data Modeling for Azure Data Services
  • Author(s): Peter ter Braake
  • Release date: July 2021
  • Publisher(s): Packt Publishing
  • ISBN: 9781801077347