© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
H. LuuBeginning Apache Spark 3https://doi.org/10.1007/978-1-4842-7383-8_3

3. Spark SQL: Foundation

Hien Luu1  
(1)
SAN JOSE, CA, USA
 

As Spark evolves and matures as a unified data processing engine with more features in each new release, its programming abstraction also evolves. The resilient distributed dataset (RDD) was the initial core programming abstraction when Spark was introduced to the world in 2012. In Spark version 1.6, a new programming abstraction called Structured APIs was introduced. This is the new and preferred way to handle data engineering tasks such as performing data processing or building data pipelines. The Structured APIs were designed ...

Get Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.