Chapter 3. How Data Virtualization Systems Work

In this chapter, we give a brief tutorial of data virtualization systems: how they are architected, how data flows through the system in response to a request, and generally how they work. We will specifically focus on the query processing engine within the system, which we will call the DV Engine throughout this chapter. We do not expect a reader to be able to build a DV Engine after reading the chapter—such an effort requires years of training in advanced systems engineering, taught at places such as the University of Maryland (the home institution of one of the authors of this book), along with real-world experience working on existing complex systems. 

Rather, our goal is to give the reader an overview of how such systems are built, arming users with knowledge so they will be able to avoid issues that come up during the system selection and deployment process. We start with fundamental architectural principles and then continue with more advanced topics in the next chapter. 

There are trade-offs that exist when designing a DV Engine. Existing engines choose particular points in the trade-off depending on how they expect the DV System will be used. If one uses the system in a different way than it was designed for, poor performance and other practical constraints will often occur. Therefore, it is important to be aware of the trade-offs that exist, the assumptions made, and the general design of the engine. This results in better ...

Get Data Virtualization in the Cloud Era now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.