Working with Spark SQL

This chapter will introduce Spark SQL and related concepts, like dataframe and dataset. Schema and advanced SQL functions will be discussed from the Apache Spark perspective; and writing custom user-defined function (UDF) and working with various data sources will also be touched upon.

This chapter uses Java APIs to create SQLContext/SparkSession and implement dataframes/datasets from Java RDD for raw data, such as CSV, and structured data, such as JSON.

Get Apache Spark 2.x for Java Developers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.