Chapter 3. Querying Variant Data in SQL

Though the concept of “data warehouses” has existed since the 1980s, their popularity really started to grow in the 1990s, when two individuals, Bill Inmon and Ralph Kimball, separately published books on building and organizing data in data warehouses.

If you’re familiar with traditional databases (like SQL Server or PostgreSQL), which we’ll cover in the next section, you know that they contain tables of information that are related by IDs or keys. This allows you to join tables together and query the information as you wish. Databases are good for storage of data from a single system, and they support tons of concurrent connections. For example, think of a database behind a website that supports thousands of quick data reads and writes.

Data warehouses, on the other hand, have a different purpose. They are often used for the unification and aggregation of data from multiple source systems, joining together data for reporting and more. Instead of expecting thousands of data retrieval connections, they instead expect to perform a smaller number of complex queries to aggregate and join across a series of tables.

The difference with data warehouses is how the data in the tables is organized. We use the terms facts and dimensions as follows:

Fact: This is a value, metric, or measure that varies between entities in a data warehouse. Thus, fact tables are often long (having many rows), with only columns to hold the values and key columns to relate ...

Get Genomics in the Azure Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Genomics in the Azure Cloud by Colby T. Ford

Chapter 3. Querying Variant Data in SQL

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly