This chapter will introduce Spark SQL and related concepts, like dataframe and dataset. Schema and advanced SQL functions will be discussed from the Apache Spark perspective; and writing custom user-defined function (UDF) and working with various data sources will also be touched upon.
This chapter uses Java APIs to create SQLContext/SparkSession and implement dataframes/datasets from Java RDD for raw data, such as CSV, and structured data, such as JSON.