Chapter 8. Accessing External Data

Chapter 4 introduced the notion of an external table, a metadata object that describes the location and access method of data not stored natively in Greenplum. The original use was to facilitate the speedy parallel loading of data into Greenplum.

Another use case for external tables is the ability to integrate external data with native Greenplum data in federated queries. As Greenplum 5 incorporated PostgreSQL 8.4 and Greenplum 6 incorporated PostgreSQL 9.4, it acquired native PostgreSQL external data capabilites, namely dblinks and foreign data wrappers. In addition, Greenplum expanded the now-deprecated gphdfs protocol to a more general and capable method of accessing external files on HDFS in a protocol known as the Platform Extension Framework (PXF). The combination of these facilities allows users to work with a wide variety of external data.

dblink

Although PostgreSQL allows users to access data in different schemas of a single database in a SQL statement, it does not permit cross database references without the notion of a database link, or dblink.

New in Greenplum Version 5

dblinks are new in Greenplum 5.

Introduced into PostgreSQL in 8.4, dblink makes it easier to do this. dblink is intended for database users to perform short ad hoc queries in other databases. It is not intended for use as a replacement for external tables or for administrative tools such as gpcopy. As with all powerful tools, DBAs must be careful with granting this ...

Get Data Warehousing with Greenplum, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.