Chapter 23. Data Warehouses Are the Past, Present, and Future

James Densmore

The death of the data warehouse, long prophesied, seems to be always on the horizon yet never realized. First it was NoSQL, then Hadoop, then data lakes that would kill the data warehouse. Yet here we are. Snowflake was the hottest initial public offering (IPO) of 2020, and the demand for data and analytics engineers who can crank value out of a data warehouse is as high as ever.

In 2010, the future of data warehouses felt pretty bleak. Most analytics teams were relying on traditional row-based, online transactional processing (OLTP) databases for their data warehouses. Data volume was exploding. When it came to processing and querying all that data for analysis, columnar databases came to the rescue, but they required expanding hardware.

While data warehouse bare-metal appliances provided a massive jump in processing power, it was quite an investment to add the hardware to your server room. It’s unimaginable 10 years later.

Things changed for the better in 2012, when Amazon launched Redshift, a columnar data warehouse that you could spin up in minutes and pay for in small increments with no massive up-front cost, built on top of PostgreSQL.

Migrations away from overtaxed, row-based SQL data warehouses to Redshift grew massively. The barrier to entry for a high-performing data warehouse was lowered substantially, ...

Get 97 Things Every Data Engineer Should Know now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.