Let's run a word count problem on stringRDD. Word count is the HelloWorld of the big data world. Word count means that we will count the occurrence of each word in the RDD:
So first we will create pairRDD as follows:
scala>valpairRDD=stringRdd.map( s => (s,1))pairRDD: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[6] at map at <console>:26
The pairRDD consists of pairs of the word and one (integer) where word represents strings of stringRDD.
Now, we will run the reduceByKey operation on this RDD to count the occurrence of each word as follows:
scala>valwordCountRDD=pairRDD.reduceByKey((x,y) =>x+y)wordcountRDD: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:28
Now, let's ...