8

Groupby Operations: Split-Apply-Combine

Grouped operations are a powerful way to aggregate, transform, and filter data. They rely on the mantra of “split–apply–combine”:

  1. Data is split into separate parts based on key(s).

  2. A function is applied to each part of the data.

  3. The results from each part are combined to create a new data set.

This is a powerful concept because parts of your original data can be split up into independent parts to perform a calculation. If you worked with databases in the past, then you should recognize that the Pandas .groupby() works just like the SQL GROUP BY. The split–apply–combine concept is also heavily used in “big data” systems that use distributed computing, with the data being split into independent parts ...

Get Pandas for Everyone: Python Data Analysis, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.