Left anti join

Left anti join results in rows from only statesPopulationDF if, and only if, there is NO corresponding row in statesTaxRatesDF.

Join the two datasets by the State column as follows:

val joinDF = statesPopulationDF.join(statesTaxRatesDF, statesPopulationDF("State") === statesTaxRatesDF("State"), "leftanti")%sqlval joinDF = spark.sql("SELECT * FROM statesPopulationDF LEFT ANTI JOIN statesTaxRatesDF ON statesPopulationDF.State = statesTaxRatesDF.State")scala> joinDF.countres22: Long = 28scala> joinDF.show(5)+--------+----+----------+| State|Year|Population|+--------+----+----------+| Alaska|2010| 714031||Delaware|2010| 899816| ...

Get Scala and Spark for Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.