© Hien Luu 2018
Hien LuuBeginning Apache Spark 2https://doi.org/10.1007/978-1-4842-3579-9_4

4. Spark SQL (Foundations)

Hien Luu1 
(1)
SAN JOSE, California, USA
 

As Spark evolves as a unified data processing engine with more features in each new release, its programming abstraction also evolves. The RDD was the initial core programming abstraction when Spark was introduced to the world in 2012. In Spark 1.6, a new programming abstraction, called Structured APIs, was introduced. This is the preferred way of performing data processing for the majority of use cases. The Structured APIs were designed to enhance developers’ productivity with easy-to-use, intuitive, and expressive APIs. In this new way of doing data processing, the data needs to be organized ...

Get Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.