Probabilistic data structures in Python

Use approximations with error bounds to trade-off system resources, e.g., memory or compute time -- especially for large-scale analytics and streaming data.

By Paco Nathan

August 1, 2016

Paco Nathan (source: O'Reilly Media)

Probabilistic data structures represent a relatively new area of algorithms. You may hear the terms approximation algorithms, sketch algorithms, or online algorithms used to describe similar work. These approaches provide approximations with error bounds: calculate the error bounds in advance, probabilistically, so that well-formed approximations get built into data collections directly. Apps no longer need batch windows, stop-to-fit models, etc. Instead they can sample the needed results at any point in a real-time data stream.

This tutorial is intended for a Python programmer who has some background working with big data, who now needs to learn how to apply probabilistic data structures for analytics with large-scale data and streaming applications, and especially for use cases that require both. This notebook shows Python code examples for five of the more well-known examples and explores their potential use cases.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Here’s a clip:

Click here to see the list of tutorials.

Post topics: Data science