Probabilistic data structures in Python
Use approximations with error bounds to trade-off system resources, e.g., memory or compute time -- especially for large-scale analytics and streaming data.
![Paco Nathan](https://www.oreilly.com/content/wp-content/uploads/sites/2/2020/01/probabilistic-data-structures-5529df52c2a953bc3cac8f3d46cac4fc-5529df52c2a953bc3cac8f3d46cac4fc-5529df52c2a953bc3cac8f3d46cac4fc.png)
Probabilistic data structures represent a relatively new area of algorithms. You may hear the terms approximation algorithms, sketch algorithms, or online algorithms used to describe similar work. These approaches provide approximations with error bounds: calculate the error bounds in advance, probabilistically, so that well-formed approximations get built into data collections directly. Apps no longer need batch windows, stop-to-fit models, etc. Instead they can sample the needed results at any point in a real-time data stream.
This tutorial is intended for a Python programmer who has some background working with big data, who now needs to learn how to apply probabilistic data structures for analytics with large-scale data and streaming applications, and especially for use cases that require both. This notebook shows Python code examples for five of the more well-known examples and explores their potential use cases.
Here’s a clip: