Part 3. Over the arc
Part 3 covers the missing pieces and documentation. In chapter 8, you’ll see algorithms you might expect to be part of the GraphX API but that aren’t as of Spark 1.6. From reading standard RDF format graph data to merging graphs, the algorithms in chapter 8 plug some of those holes.
Chapter 8 also covers how to use IndexedRDD, which is like the HashMap of RDDs. We go through an example showing how it can speed up performance.
Finally, you’ll see an example of identifying likely missing data from Wikipedia using ideas from graph isomorphisms—finding pieces of graphs that are similar to each other.
Chapter 9 is all about putting GraphX into production and doing debugging and performance tuning. It steps you through tools ...
Get Spark GraphX in Action now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.