Skip to content
  • Sign In
  • Try Now
View all events
Machine Learning

Machine Learning with Python

Published by Pearson

Intermediate to advanced content levelIntermediate to advanced

Create Production-Quality Machine Learning Pipelines with Feature Engineering

  • Develop Python code pipelines for data engineering and machine learning model training.
  • Use existing powerful open-source packages, like scikit-learn, to write your own professional code for data engineering, transformation and pre-processing.
  • Learn about the power of pipelines and why they are far superior to hard-coding values and procedures.
  • Correctly and safely save your trained models for easy inclusion in your Python package for simpler and smoother testing, installation, deployment.

Data in the real-world can be used to invoke machine learning (ML) models but there is work to be done on it. In fact, data presents a variety of characteristics that can even make it unsuitable for ML models. For example, data can contain missing values and some ML libraries cannot cope with that. Or data can be presented in strings, or categories, rather than numbers, and the computer can only perform computations with numbers. There are many examples of such challenges. Therefore, before training or invoking ML models, data manipulation and transformation is required. Pipelines can help. Pipelines assemble a set of steps that can be performed one after the other. Each step can take input and produce output, which can be the input to the next step. Using pipelines can help you streamline what can be computationally expensive and time-consuming. Pipelines also help with existing libraries, and can enforce desired order of application steps, or create a convenient workflow for reproducibility of the work.

With code pipelines, you can sequentially apply a list of transforms and build a final ML model. An ML pipeline improves the performance and organization of the entire model portfolio, getting models into production more quickly and making managing machine learning models easier.

What you’ll learn and how you can apply it

By the end of the live online course, you’ll understand:

  • The difference between data engineering and ML code pipelines and code with hard-coded values and procedures.
  • How to professionally write code pipelines for data engineering, pre-processing and machine learning model training.
  • How to use many powerful open-source data engineering, transformation and manipulation packages in Python.
  • How to professionally write your own data transformers (based on scikit-learn style).
  • How to run pipelines to apply data manipulation and machine learning model training.
  • Selecting and applying the correct transformations based on the type of your data.
  • How to correctly and safely save and restore pipelines and models for future use.

And you’ll be able to:

  • Create professional code pipelines for machine learning tasks which will make other tasks such as model deployment and testing easier.
  • Take advantage of existing open-source Python packages to automate machine learning tasks.
  • Become proficient and comfortable with creating your own data transformers if need be. Here you will learn how to use object-oriented programming style and write code compatible with Python’s scikit-learn package.
  • Apply multiple data transformation and pre-processing techniques to various types of variables such as continuous, categorical and temporal variables.
  • Correctly and safely save and reload data transformers and machine learning models for future use.

This live event is for you because...

  • You are a data engineer or developer and would like to stand out from the crowd by learning how to write professional and production quality ML code.
  • You are a data engineer or developer and want to learn how to correctly transform and pre-process various types of data such as continuous, categorical and temporal variables.
  • You want to learn the daunting task of writing your own data transformers and processors that are suitable for use within pipelines.
  • You are a data engineer or scientist and want to learn the essential skill of saving and loading models, data transformers, and entire pipelines, safely and correctly.
  • You are a data engineer/scientist or developer and you would like to learn how to prepare your models for packaging and deployment by either writing your own code, or taking advantage of existing powerful freely-available libraries.

Prerequisites

  • Familiarity with Python. Students should be relatively comfortable with Python coding practices (i.e. intermediate Python level and object-oriented programming in Python).
  • Basic knowledge of data manipulation and machine learning in Python.
  • Familiarity with how to train machine learning models and using them to obtain predictions (using Python’s scikit-learn package).

Course Set-up

  • Any operating system is fine.
  • Python 3.6 or above (Anaconda distribution).
  • Speedy internet connection.

Recommended Preparation

Recommended Follow-up

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Segment 1: Overview of Feature Engineering and Data Analysis.(50 minutes)

  • Definition of Feature Engineering, Selection and Pre-processing.
  • Overview of different types of features (temporal, numerical and categorical).
  • Exploratory data analysis and the problem of missing data.
  • Feature Engineering Demo.
  • ML model training demo.
  • Reproducibility and its importance in machine learning
  • Live Demo and Python code examples of the above.

Q&A (10 minutes)

Break (10 minutes)

Segment 2: Feature Engineering with Open Source and In-house Tools. (50 minutes)

  • Example code for feature engineering with open source libraries.
  • Overview of the scikit-learn API.
  • How to create your own scikit-learn compatible feature engineering transformers (includes a short overview of object-oriented programming in Python).
  • How to create feature engineering transformers that learn parameters from data.
  • Example data engineering techniques (includes but not limited to: dealing with missing data, feature transformation, scaling, standardization, encoding and value mapping).
  • Demonstration of why hard-coding values in data engineering is inconvenient and problematic (so in the next session we can see the power and flexibility of pipelines).
  • Live Demo and Python code examples of the above.

Q&A (10 minutes)

Break (10 minutes)

Segment 3: Creating End-to-end Feature Engineering and ML Pipelines. (50 minutes)

  • Overview of the scikit-learn pipelines.
  • Discussion of why pipelines are highly useful.
  • A grid-search example to demonstrate why pipelines are better than separate steps.
  • How to create a complete feature engineering and ML pipeline (includes model fitting).
  • Using the pipeline to make predictions.
  • Feature selection and whether it should be part of the pipeline.
  • Creating a pipeline that is ready for deployment.
  • How to correctly and safely save and reload pipelines.
  • Pipelines for deep learning.
  • Live Demo and Python code examples of the above.

Q&A (10 minutes)

Course wrap-up and next steps (10 minutes)

Your Instructor

  • Noureddin Sadawi

    Dr. Noureddin Sadawi is a consultant in machine/deep learning and data science. He has several years’ experience in various areas involving data manipulation and analysis. He received his PhD from the University of Birmingham, United Kingdom. He is the winner of two international scientific software development contests - at TREC2011 and CLEF2012.

    Noureddin is an avid scientific software researcher and developer with a passion for learning and teaching new technologies. He is an experienced scientific software developer and data analyst; over the last few years he has been using Python as his preferred programming language. Also, he has been involved in several projects spanning a variety of fields such as bioinformatics, textual/image/video data analysis, drug discovery, omics data analysis and computer network security. He has taught at multiple universities in the UK and has worked as a software engineer in different roles. He is the founder of SoftLight LTD (https://www.softlight.tech/), a London-based company that specialises in data science and machine/deep learning. Recently, he has joined the University of Oxford as a part-time lecturer.

    linkedinXlinksearch