Probabilistic data structures in Python
Use approximations with error bounds to trade-off system resources, e.g., memory or compute time -- especially for large-scale analytics and streaming data.
Probabilistic data structures represent a relatively new area of algorithms. You may hear the terms approximation algorithms, sketch algorithms, or online algorithms used to describe similar work. These approaches provide approximations with error bounds: calculate the error bounds in advance, probabilistically, so that well-formed approximations get built into data collections directly. Apps no longer need batch windows, stop-to-fit models, etc. Instead they can sample the needed results at any point in a real-time data stream.
This tutorial is intended for a Python programmer who has some background working with big data, who now needs to learn how to apply probabilistic data structures for analytics with large-scale data and streaming applications, and especially for use cases that require both. This notebook shows Python code examples for five of the more well-known examples and explores their potential use cases.
Here’s a clip: