Python Feature Engineering Cookbook

Book description

Extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, and scikit-learn libraries

Key Features

  • Discover solutions for feature generation, feature extraction, and feature selection
  • Uncover the end-to-end feature engineering process across continuous, discrete, and unstructured datasets
  • Implement modern feature extraction techniques using Python's pandas, scikit-learn, SciPy and NumPy libraries

Book Description

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code.

Using Python libraries such as pandas, scikit-learn, Featuretools, and Feature-engine, you'll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This book will cover Python recipes that will help you automate feature engineering to simplify complex processes. You'll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains.

By the end of this book, you'll have discovered tips and practical solutions to all of your feature engineering problems.

What you will learn

  • Simplify your feature engineering pipelines with powerful Python packages
  • Get to grips with imputing missing values
  • Encode categorical variables with a wide set of techniques
  • Extract insights from text quickly and effortlessly
  • Develop features from transactional data and time series data
  • Derive new features by combining existing variables
  • Understand how to transform, discretize, and scale your variables
  • Create informative variables from date and time

Who this book is for

This book is for machine learning professionals, AI engineers, data scientists, and NLP and reinforcement learning engineers who want to optimize and enrich their machine learning models with the best features. Knowledge of machine learning and Python coding will assist you with understanding the concepts covered in this book.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Python Feature Engineering Cookbook
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Sections
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
    5. Get in touch
      1. Reviews
  6. Foreseeing Variable Problems When Building ML Models
    1. Technical requirements
    2. Identifying numerical and categorical variables
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Quantifying missing data
      1. Getting ready
      2. How to do it...
      3. How it works...
    4. Determining cardinality in categorical variables
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Pinpointing rare categories in categorical variables
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Identifying a linear relationship
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    7. Identifying a normal distribution
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    8. Distinguishing variable distribution
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    9. Highlighting outliers
      1. Getting ready
      2. How to do it...
      3. How it works...
    10. Comparing feature magnitude
      1. Getting ready
      2. How to do it...
      3. How it works...
  7. Imputing Missing Data
    1. Technical requirements
    2. Removing observations with missing data
      1. How to do it...
      2. How it works...
      3. See also
    3. Performing mean or median imputation
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    4. Implementing mode or frequent category imputation
      1. How to do it...
      2. How it works...
      3. See also
    5. Replacing missing values with an arbitrary number
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    6. Capturing missing values in a bespoke category
      1. How to do it...
      2. How it works...
      3. See also
    7. Replacing missing values with a value at the end of the distribution
      1. How to do it...
      2. How it works...
      3. See also
    8. Implementing random sample imputation
      1. How to do it...
      2. How it works...
      3. See also
    9. Adding a missing value indicator variable
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    10. Performing multivariate imputation by chained equations
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    11. Assembling an imputation pipeline with scikit-learn
      1. How to do it...
      2. How it works...
      3. See also
    12. Assembling an imputation pipeline with Feature-engine
      1. How to do it...
      2. How it works...
      3. See also
  8. Encoding Categorical Variables
    1. Technical requirements
    2. Creating binary variables through one-hot encoding
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Performing one-hot encoding of frequent categories
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Replacing categories with ordinal numbers
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    5. Replacing categories with counts or frequency of observations
      1. How to do it...
      2. How it works...
      3. There's more...
    6. Encoding with integers in an ordered manner
      1. How to do it...
      2. How it works...
      3. See also
    7. Encoding with the mean of the target
      1. How to do it...
      2. How it works...
      3. See also
    8. Encoding with the Weight of Evidence
      1. How to do it...
      2. How it works...
      3. See also
    9. Grouping rare or infrequent categories
      1. How to do it...
      2. How it works...
      3. See also
    10. Performing binary encoding
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    11. Performing feature hashing
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  9. Transforming Numerical Variables
    1. Technical requirements
    2. Transforming variables with the logarithm
      1. How to do it...
      2. How it works...
      3. See also
    3. Transforming variables with the reciprocal function
      1. How to do it...
      2. How it works...
      3. See also
    4. Using square and cube root to transform variables
      1. How to do it...
      2. How it works...
      3. There's more...
    5. Using power transformations on numerical variables
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    6. Performing Box-Cox transformation on numerical variables
      1. How to do it...
      2. How it works...
      3. See also
    7. Performing Yeo-Johnson transformation on numerical variables
      1. How to do it...
      2. How it works...
      3. See also
  10. Performing Variable Discretization
    1. Technical requirements
    2. Dividing the variable into intervals of equal width
      1. How to do it...
      2. How it works...
      3. See also
    3. Sorting the variable values in intervals of equal frequency
      1. How to do it...
      2. How it works...
    4. Performing discretization followed by categorical encoding
      1. How to do it...
      2. How it works...
      3. See also
    5. Allocating the variable values in arbitrary intervals
      1. How to do it...
      2. How it works...
    6. Performing discretization with k-means clustering
      1. How to do it...
      2. How it works...
    7. Using decision trees for discretization
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  11. Working with Outliers
    1. Technical requirements
    2. Trimming outliers from the dataset
      1. How to do it...
      2. How it works...
      3. There's more...
    3. Performing winsorization
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    4. Capping the variable at arbitrary maximum and minimum values
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    5. Performing zero-coding – capping the variable at zero
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
  12. Deriving Features from Dates and Time Variables
    1. Technical requirements
    2. Extracting date and time parts from a datetime variable
      1. How to do it...
      2. How it works...
      3. See also
    3. Deriving representations of the year and month
      1. How to do it...
      2. How it works...
      3. See also
    4. Creating representations of day and week
      1. How to do it...
      2. How it works...
      3. See also
    5. Extracting time parts from a time variable
      1. How to do it...
      2. How it works...
    6. Capturing the elapsed time between datetime variables
      1. How to do it...
      2. How it works...
      3. See also
    7. Working with time in different time zones
      1. How to do it...
      2. How it works...
      3. See also
  13. Performing Feature Scaling
    1. Technical requirements
    2. Standardizing the features
      1. How to do it...
      2. How it works...
      3. See also
    3. Performing mean normalization
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    4. Scaling to the maximum and minimum values
      1. How to do it...
      2. How it works...
      3. See also
    5. Implementing maximum absolute scaling
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    6. Scaling with the median and quantiles
      1. How to do it...
      2. How it works...
      3. See also
    7. Scaling to vector unit length
      1. How to do it...
      2. How it works...
      3. See also
  14. Applying Mathematical Computations to Features
    1. Technical requirements
    2. Combining multiple features with statistical operations
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Combining pairs of features with mathematical functions
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Performing polynomial expansion
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Deriving new features with decision trees
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Carrying out PCA
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  15. Creating Features with Transactional and Time Series Data
    1. Technical requirements
    2. Aggregating transactions with mathematical operations
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Aggregating transactions in a time window
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Determining the number of local maxima and minima
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Deriving time elapsed between time-stamped events
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    6. Creating features from transactions with Featuretools
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
  16. Extracting Features from Text Variables
    1. Technical requirements
    2. Counting characters, words, and vocabulary
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Estimating text complexity by counting sentences
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Creating features with bag-of-words and n-grams
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Implementing term frequency-inverse document frequency
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Cleaning and stemming text variables
      1. Getting ready
      2. How to do it...
      3. How it works...
  17. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Python Feature Engineering Cookbook
  • Author(s): Soledad Galli
  • Release date: January 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781789806311