Chapter 2. Choosing Aggregates

One of the most vexing tasks in deploying dimensional aggregates is choosing which aggregates to design and deploy. Your aim is to strike the correct balance between the performance gain provided by aggregate schemas and their cost in terms of resource requirements.

Several factors make it difficult to achieve this balance. The sheer number of possible aggregates is much larger than you might expect. Of course, they can't all be included in the data warehouse. But it would be exceedingly tedious to study each permutation of attributes to identify those aggregates that might produce useful cost savings. If your system is not yet in production, you do not have existing query patterns to optimize so you might not even be able to tell if the cost savings offered by a particular aggregate would ever be realized. And if you do have usage data, it does not tell you the relative importance of different queries. Last, you may find that you need to choose between the needs of different user constituencies.

This chapter shows you how to cut through these issues and determine which aggregates make sense to deploy in your situation.

The first section briefly examines how to describe an aggregate schema.

The next section gives you a variety of techniques you can use to identify aggregates that are likely to be of value in your implementation. These techniques are used to establish a pool of aggregates for potential inclusion in your data warehouse.

The last part ...

Get Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.