Book description
Over 95 hands-on recipes to leverage the power of pandas for efficient scientific computation and data analysis
About This Book
- Use the power of pandas to solve most complex scientific computing problems with ease
- Leverage fast, robust data structures in pandas to gain useful insights from your data
- Practical, easy to implement recipes for quick solutions to common problems in data using pandas
Who This Book Is For
This book is for data scientists, analysts and Python developers who wish to explore data analysis and scientific computing in a practical, hands-on manner. The recipes included in this book are suitable for both novice and advanced users, and contain helpful tips, tricks and caveats wherever necessary. Some understanding of pandas will be helpful, but not mandatory.
What You Will Learn
- Master the fundamentals of pandas to quickly begin exploring any dataset
- Isolate any subset of data by properly selecting and querying the data
- Split data into independent groups before applying aggregations and transformations to each group
- Restructure data into tidy form to make data analysis and visualization easier
- Prepare real-world messy datasets for machine learning
- Combine and merge data from different sources through pandas SQL-like operations
- Utilize pandas unparalleled time series functionality
- Create beautiful and insightful visualizations through pandas direct hooks to Matplotlib and Seaborn
In Detail
This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way.
The pandas library is massive, and it’s common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter.
Many advanced recipes combine several different features across the pandas library to generate results.
Style and approach
The author relies on his vast experience teaching pandas in a professional setting to deliver very detailed explanations for each line of code in all of the recipes. All code and dataset explanations exist in Jupyter Notebooks, an excellent interface for exploring data.
Table of contents
- Title Page
- Copyright
- Credits
- About the Author
- Acknowledgement
- About the Reviewers
- www.PacktPub.com
- Customer Feedback
- Preface
-
Pandas Foundations
- Introduction
- Dissecting the anatomy of a DataFrame
- Accessing the main DataFrame components
- Understanding data types
- Selecting a single column of data as a Series
- Calling Series methods
- Working with operators on a Series
- Chaining Series methods together
- Making the index meaningful
- Renaming row and column names
- Creating and deleting columns
-
Essential DataFrame Operations
- Introduction
- Selecting multiple DataFrame columns
- Selecting columns with methods
- Ordering column names sensibly
- Operating on the entire DataFrame
- Chaining DataFrame methods together
- Working with operators on a DataFrame
- Comparing missing values
- Transposing the direction of a DataFrame operation
- Determining college campus diversity
- Beginning Data Analysis
- Selecting Subsets of Data
-
Boolean Indexing
- Introduction
- Calculating boolean statistics
- Constructing multiple boolean conditions
- Filtering with boolean indexing
- Replicating boolean indexing with index selection
- Selecting with unique and sorted indexes
- Gaining perspective on stock prices
- Translating SQL WHERE clauses
- Determining the normality of stock market returns
- Improving readability of boolean indexing with the query method
- Preserving Series with the where method
- Masking DataFrame rows
- Selecting with booleans, integer location, and labels
- Index Alignment
-
Grouping for Aggregation, Filtration, and Transformation
- Introduction
- Defining an aggregation
- Grouping and aggregating with multiple columns and functions
- Removing the MultiIndex after grouping
- Customizing an aggregation function
- Customizing aggregating functions with *args and **kwargs
- Examining the groupby object
- Filtering for states with a minority majority
- Transforming through a weight loss bet
- Calculating weighted mean SAT scores per state with apply
- Grouping by continuous variables
- Counting the total number of flights between cities
- Finding the longest streak of on-time flights
-
Restructuring Data into a Tidy Form
- Introduction
- Tidying variable values as column names with stack
- Tidying variable values as column names with melt
- Stacking multiple groups of variables simultaneously
- Inverting stacked data
- Unstacking after a groupby aggregation
- Replicating pivot_table with a groupby aggregation
- Renaming axis levels for easy reshaping
- Tidying when multiple variables are stored as column names
- Tidying when multiple variables are stored as column values
- Tidying when two or more values are stored in the same cell
- Tidying when variables are stored in column names and values
- Tidying when multiple observational units are stored in the same table
- Combining Pandas Objects
-
Time Series Analysis
- Introduction
- Understanding the difference between Python and pandas date tools
- Slicing time series intelligently
- Using methods that only work with a DatetimeIndex
- Counting the number of weekly crimes
- Aggregating weekly crime and traffic accidents separately
- Measuring crime by weekday and year
- Grouping with anonymous functions with a DatetimeIndex
- Grouping by a Timestamp and another column
- Finding the last time crime was 20% lower with merge_asof
-
Visualization with Matplotlib, Pandas, and Seaborn
- Introduction
- Getting started with matplotlib
- Visualizing data with matplotlib
- Plotting basics with pandas
- Visualizing the flights dataset
- Stacking area charts to discover emerging trends
- Understanding the differences between seaborn and pandas
- Doing multivariate analysis with seaborn Grids
- Uncovering Simpson's paradox in the diamonds dataset with seaborn
Product information
- Title: Pandas Cookbook
- Author(s):
- Release date: October 2017
- Publisher(s): Packt Publishing
- ISBN: 9781784393878
You might also like
book
Learning pandas - Second Edition
Get to grips with pandas—a versatile and high-performance Python library for data manipulation, analysis, and discovery …
book
Mastering pandas - Second Edition
Perform advanced data manipulation tasks using pandas and become an expert data analyst. Key Features Manipulate …
book
Pandas 1.x Cookbook - Second Edition
Use the power of pandas to solve most complex scientific computing problems with ease. Revised for …
video
Programming with Data: Python and Pandas LiveLessons
5 Hours of Video Instruction Learn how to use Pandas and Python to load and transform …