Chapter 5. Example Pipeline Architectures

Let’s take a look at some example architectures that illustrate concepts presented here, including:

  • Product recommendations using Spark, including some resources to investigate for real-time recommendations with Spark Streaming

  • Payment processing using Python’s Flask and scikit-learn to detect fraud and other financial crimes

  • A site reliability use case analyzing log data for anomalies using Elasticsearch, Logstash, and Kibana

  • Inference at the edge using classifiers taking advantage of special Nvidia hardware at embedded locations for computer vision

With the examples outlined here, we will go over a few aspects. First, the use case: what problem are you trying to solve? As for the general architecture, how you will structure the various pieces of the data platform to achieve the desired end result? Last, we will look at concrete ideas for implementing the solution in Kubernetes.

Sample code and Kubernetes resource definitions (in YAML) have been provided to help guide implementers in the correct direction.

E-commerce: Product Recommendation

In e-commerce, recommendations are one of the most reliable ways for retailers to increase revenue. We can suggest items to users that they might buy based on data that we’ve observed in the past about their own behavior or the behavior of others. For instance, we can use collaborative filtering to recommend products to users based on the items that profiles similar to the user in question ...

Get Open Source Data Pipelines for Intelligent Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.