Python Feature Engineering Cookbook - Second Edition

Book description

Create end-to-end, reproducible feature engineering pipelines that can be deployed into production using open-source Python libraries

Key Features

  • Learn and implement feature engineering best practices
  • Reinforce your learning with the help of multiple hands-on recipes
  • Build end-to-end feature engineering pipelines that are performant and reproducible

Book Description

Feature engineering, the process of transforming variables and creating features, albeit time-consuming, ensures that your machine learning models perform seamlessly. This second edition of Python Feature Engineering Cookbook will take the struggle out of feature engineering by showing you how to use open source Python libraries to accelerate the process via a plethora of practical, hands-on recipes.

This updated edition begins by addressing fundamental data challenges such as missing data and categorical values, before moving on to strategies for dealing with skewed distributions and outliers. The concluding chapters show you how to develop new features from various types of data, including text, time series, and relational databases. With the help of numerous open source Python libraries, you'll learn how to implement each feature engineering method in a performant, reproducible, and elegant manner.

By the end of this Python book, you will have the tools and expertise needed to confidently build end-to-end and reproducible feature engineering pipelines that can be deployed into production.

What you will learn

  • Impute missing data using various univariate and multivariate methods
  • Encode categorical variables with one-hot, ordinal, and count encoding
  • Handle highly cardinal categorical variables
  • Transform, discretize, and scale your variables
  • Create variables from date and time with pandas and Feature-engine
  • Combine variables into new features
  • Extract features from text as well as from transactional data with Featuretools
  • Create features from time series data with tsfresh

Who this book is for

This book is for machine learning and data science students and professionals, as well as software engineers working on machine learning model deployment, who want to learn more about how to transform their data and create new features to train machine learning models in a better way.

Table of contents

  1. Python Feature Engineering Cookbook
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
    4. Download the color images
    5. Conventions used
    6. Sections
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There’s more…
      5. See also
    7. Get in touch
    8. Reviews
    9. Share Your Thoughts
    10. Download a Free PDF copy of this book
  6. Chapter 1: Imputing Missing Data
    1. Technical requirements
    2. Removing observations with missing data
      1. How to do it...
      2. How it works...
    3. Performing mean or median imputation
      1. How to do it...
      2. How it works...
    4. Imputing categorical variables
      1. How to do it...
      2. How it works...
    5. Replacing missing values with an arbitrary number
      1. How to do it...
      2. How it works...
    6. Finding extreme values for imputation
      1. How to do it...
      2. How it works...
    7. Marking imputed values
      1. How to do it...
      2. How it works...
    8. Performing multivariate imputation by chained equations
      1. How to do it...
      2. How it works...
      3. See also
    9. Estimating missing data with nearest neighbors
      1. How to do it...
      2. How it works...
  7. Chapter 2: Encoding Categorical Variables
    1. Technical requirements
    2. Creating binary variables through one-hot encoding
      1. How to do it...
      2. How it works...
      3. There’s more...
    3. Performing one-hot encoding of frequent categories
      1. How to do it...
      2. How it works...
      3. There’s more...
    4. Replacing categories with counts or the frequency of observations
      1. How to do it...
      2. How it works...
    5. Replacing categories with ordinal numbers
      1. How to do it...
      2. How it works...
      3. There’s more...
    6. Performing ordinal encoding based on the target value
      1. How to do it...
      2. How it works...
      3. See also
    7. Implementing target mean encoding
      1. How to do it...
      2. How it works…
      3. There’s more…
    8. Encoding with the Weight of Evidence
      1. How to do it...
      2. How it works...
      3. See also
    9. Grouping rare or infrequent categories
      1. How to do it...
      2. How it works...
    10. Performing binary encoding
      1. How to do it...
      2. How it works...
      3. See also
  8. Chapter 3: Transforming Numerical Variables
    1. Transforming variables with the logarithm function
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There’s more…
    2. Transforming variables with the reciprocal function
      1. How to do it...
      2. How it works...
    3. Using the square root to transform variables
      1. How to do it...
      2. How it works…
    4. Using power transformations
      1. How to do it...
      2. How it works...
    5. Performing Box-Cox transformation
      1. How to do it...
      2. How it works...
      3. There’s more…
    6. Performing Yeo-Johnson transformation
      1. How to do it...
      2. How it works...
      3. There’s more…
  9. Chapter 4: Performing Variable Discretization
    1. Technical requirements
    2. Performing equal-width discretization
      1. How to do it...
      2. How it works…
      3. See also
    3. Implementing equal-frequency discretization
      1. How to do it...
      2. How it works…
    4. Discretizing the variable into arbitrary intervals
      1. How to do it...
      2. How it works...
    5. Performing discretization with k-means clustering
      1. How to do it...
      2. How it works...
      3. See also
    6. Implementing feature binarization
      1. Getting ready
      2. How to do it...
      3. How it works…
    7. Using decision trees for discretization
      1. How to do it...
      2. How it works...
      3. There’s more...
  10. Chapter 5: Working with Outliers
    1. Technical requirements
    2. Visualizing outliers with boxplots
      1. How to do it...
      2. How it works…
    3. Finding outliers using the mean and standard deviation
      1. How to do it...
      2. How it works…
    4. Finding outliers with the interquartile range proximity rule
      1. How to do it...
      2. How it works…
    5. Removing outliers
      1. How to do it...
      2. How it works...
    6. Capping or censoring outliers
      1. How to do it...
      2. How it works...
      3. There’s more...
    7. Capping outliers using quantiles
      1. How to do it...
      2. How it works...
  11. Chapter 6: Extracting Features from Date and Time Variables
    1. Technical requirements
    2. Extracting features from dates with pandas
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There’s more…
      5. See also
    3. Extracting features from time with pandas
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There’s more…
    4. Capturing the elapsed time between datetime variables
      1. How to do it...
      2. How it works...
      3. See also
    5. Working with time in different time zones
      1. How to do it...
      2. How it works...
      3. See also
    6. Automating feature extraction with Feature-engine
      1. How to do it...
      2. How it works...
  12. Chapter 7: Performing Feature Scaling
    1. Technical requirements
    2. Standardizing the features
      1. How to do it...
      2. How it works...
    3. Scaling to the maximum and minimum values
      1. How to do it...
      2. How it works...
    4. Scaling with the median and quantiles
      1. How to do it...
      2. How it works...
    5. Performing mean normalization
      1. How to do it...
      2. How it works…
      3. There’s more...
    6. Implementing maximum absolute scaling
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There’s more...
    7. Scaling to vector unit length
      1. How to do it...
      2. How it works...
  13. Chapter 8: Creating New Features
    1. Technical requirements
    2. Combining features with mathematical functions
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Comparing features to reference variables
      1. How to do it…
      2. How it works...
      3. See also
    4. Performing polynomial expansion
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There’s more...
      5. See also
    5. Combining features with decision trees
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Creating periodic features from cyclical variables
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. See also
    7. Creating spline features
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. See also
  14. Chapter 9: Extracting Features from Relational Data with Featuretools
    1. Technical requirements
    2. Setting up an entity set and creating features automatically
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Creating features with general and cumulative operations
      1. Getting ready
      2. How to do it...
      3. How it works...
    4. Combining numerical features
      1. How to do it...
      2. How it works...
    5. Extracting features from date and time
      1. How to do it...
      2. How it works...
      3. There’s more...
    6. Extracting features from text
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Creating features with aggregation primitives
      1. Getting ready
      2. How to do it...
      3. How it works...
  15. Chapter 10: Creating Features from a Time Series with tsfresh
    1. Technical requirements
    2. Extracting features automatically from a time series
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Creating and selecting features for a time series
      1. How to do it...
      2. How it works...
      3. See also
    4. Tailoring feature creation to different time series
      1. How to do it...
      2. How it works...
    5. Creating pre-selected features
      1. How to do it...
      2. How it works...
    6. Embedding feature creation in a scikit-learn pipeline
      1. How to do it...
      2. How it works...
      3. See also
  16. Chapter 11: Extracting Features from Text Variables
    1. Technical requirements
    2. Counting characters, words, and vocabulary
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There’s more...
      5. See also
    3. Estimating text complexity by counting sentences
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There’s more...
    4. Creating features with bag-of-words and n-grams
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Implementing term frequency-inverse document frequency
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Cleaning and stemming text variables
      1. Getting ready
      2. How to do it...
      3. How it works...
  17. Index
    1. Why subscribe?
  18. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a Free PDF copy of this book

Product information

  • Title: Python Feature Engineering Cookbook - Second Edition
  • Author(s): Soledad Galli
  • Release date: October 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781804611302