Machine Learning in Biotechnology and Life Sciences

Book description

Explore all the tools and templates needed for data scientists to drive success in their biotechnology careers with this comprehensive guide

Key Features

  • Learn the applications of machine learning in biotechnology and life science sectors
  • Discover exciting real-world applications of deep learning and natural language processing
  • Understand the general process of deploying models to cloud platforms such as AWS and GCP

Book Description

The booming fields of biotechnology and life sciences have seen drastic changes over the last few years. With competition growing in every corner, companies around the globe are looking to data-driven methods such as machine learning to optimize processes and reduce costs. This book helps lab scientists, engineers, and managers to develop a data scientist's mindset by taking a hands-on approach to learning about the applications of machine learning to increase productivity and efficiency in no time.

You'll start with a crash course in Python, SQL, and data science to develop and tune sophisticated models from scratch to automate processes and make predictions in the biotechnology and life sciences domain. As you advance, the book covers a number of advanced techniques in machine learning, deep learning, and natural language processing using real-world data.

By the end of this machine learning book, you'll be able to build and deploy your own machine learning models to automate processes and make predictions using AWS and GCP.

What you will learn

  • Get started with Python programming and Structured Query Language (SQL)
  • Develop a machine learning predictive model from scratch using Python
  • Fine-tune deep learning models to optimize their performance for various tasks
  • Find out how to deploy, evaluate, and monitor a model in the cloud
  • Understand how to apply advanced techniques to real-world data
  • Discover how to use key deep learning methods such as LSTMs and transformers

Who this book is for

This book is for data scientists and scientific professionals looking to transcend to the biotechnology domain. Scientific professionals who are already established within the pharmaceutical and biotechnology sectors will find this book useful. A basic understanding of Python programming and beginner-level background in data science conjunction is needed to get the most out of this book.

Table of contents

  1. Machine Learning in Biotechnology and Life Sciences
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share your thoughts
  6. Section 1: Getting Started with Data
  7. Chapter 1: Introducing Machine Learning for Biotechnology
    1. Understanding the biotechnology field
    2. Combining biotechnology and machine learning
    3. Exploring machine learning software
      1. Python (programming language)
      2. MySQL (database)
      3. AWS and GCP (Cloud Computing)
    4. Summary
  8. Chapter 2: Introducing Python and the Command Line
    1. Technical requirements
    2. Introducing the command line
      1. Creating and running Python scripts
      2. Installing packages with pip
      3. When things don't work…
    3. Discovering the Python language
      1. Selecting an IDE
      2. Data types
    4. Tutorial – getting started in Python
      1. Creating variables
      2. Importing installed libraries
      3. General calculations
      4. Lists and dictionaries
      5. Arrays
      6. Creating functions
      7. Iteration and loops
      8. List comprehension
      9. DataFrames
      10. API requests and JSON
      11. Parsing PDFs
      12. Pickling files
      13. Object-oriented programming
    5. Tutorial – working with Rdkit and BioPython
      1. Working with Small Molecules and Rdkit
    6. Summary
  9. Chapter 3: Getting Started with SQL and Relational Databases
    1. Technical requirements
    2. Exploring relational databases
      1. Database normalization
      2. Types of relational databases
    3. Tutorial – getting started with MySQL
      1. Installing MySQL Workbench
      2. Creating a MySQL instance on AWS
      3. Working with MySQL
      4. Creating databases
      5. Querying data
      6. Conditional querying
      7. Grouping data
      8. Ordering data
      9. Joining tables
    4. Summary
  10. Chapter 4: Visualizing Data with Python
    1. Technical requirements
    2. Exploring the six steps of data visualization
    3. Commonly used visualization libraries
    4. Tutorial – visualizing data in Python
      1. Getting data
      2. Summarizing data with bar plots
      3. Working with distributions and histograms
      4. Visualizing features with scatter plots
      5. Identifying correlations with heat maps
      6. Displaying sequential and time-series plots
      7. Emphasizing flows with Sankey diagrams
      8. Visualizing small molecules
      9. Visualizing large molecules
    5. Summary
  11. Section 2: Developing and Training Models
  12. Chapter 5: Understanding Machine Learning
    1. Technical requirements
    2. Understanding ML
    3. Overfitting and underfitting
    4. Developing an ML model
      1. Data acquisition
      2. Exploratory data analysis and preprocessing:
      3. Developing and validating models
      4. Saving a model for deployment
    5. Summary
  13. Chapter 6: Unsupervised Machine Learning
    1. Introduction to UL
    2. Understanding clustering algorithms
      1. Exploring the different clustering algorithms
      2. Tutorial – breast cancer prediction via clustering
    3. Understanding DR
      1. Avoiding the COD
      2. Tutorial – exploring DR models
    4. Summary
  14. Chapter 7: Supervised Machine Learning
    1. Understanding supervised learning
    2. Measuring success in supervised machine learning
      1. Measuring success with classifiers
      2. Measuring success with regressors
    3. Understanding classification in supervised machine learning
      1. Exploring different classification models
      2. Tutorial: Classification of proteins using GCP
    4. Understanding regression in supervised machine learning
      1. Exploring different regression models
      2. Tutorial: Regression for property prediction
    5. Summary
  15. Chapter 8: Understanding Deep Learning
    1. Understanding the field of deep learning
      1. Neural networks
      2. The perceptron
      3. Exploring the different types of deep learning models
    2. Selecting an activation function
    3. Measuring progress with loss
      1. Deep learning with Keras
      2. Understanding the differences between Keras and TensorFlow
      3. Getting started with Keras and ANNs
    4. Tutorial – protein sequence classification via LSTMs using Keras and MLflow
      1. Importing the necessary libraries and datasets
      2. Checking the dataset
      3. Splitting the dataset
      4. Preprocessing the data
      5. Developing models with Keras and MLflow
      6. Reviewing the model's performance
    5. Tutorial – anomaly detection in manufacturing using AWS Lookout for Vision
    6. Summary
  16. Chapter 9: Natural Language Processing
    1. Introduction to NLP
    2. Getting started with NLP using NLTK and SciPy
    3. Working with structured data
      1. Searching for scientific articles
      2. Exploring our datasets
    4. Tutorial – clustering and topic modeling
    5. Working with unstructured data
      1. OCR using AWS Textract
      2. Entity recognition using AWS Comprehend
    6. Tutorial – developing a scientific data search engine using transformers
    7. Summary
  17. Chapter 10: Exploring Time Series Analysis
    1. Understanding time series data
      1. Treating time series data as a structured dataset
    2. Exploring the components of a time series dataset
    3. Tutorial – forecasting demand using Prophet and LSTM
      1. Using Prophet for time series modeling
      2. Using LSTM for time series modeling
    4. Summary
  18. Section 3: Deploying Models to Users
  19. Chapter 11: Deploying Models with Flask Applications
    1. Understanding API frameworks
    2. Working with Flask and Visual Studio Code
    3. Using Flask as an API and web application
    4. Tutorial – Deploying a pretrained model using Flask
    5. Summary
  20. Chapter 12: Deploying Applications to the Cloud
    1. Exploring current cloud computing platforms
    2. Understanding containers and images
      1. Understanding the benefits of containers
    3. Tutorial – deploying a container to AWS (Lightsail)
    4. Tutorial – deploying an application to GCP (App Engine)
    5. Tutorial – deploying an application's code to GitHub
    6. Summary
    7. Why subscribe?
  21. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share your thoughts

Product information

  • Title: Machine Learning in Biotechnology and Life Sciences
  • Author(s): Saleh Alkhalifa
  • Release date: January 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781801811910