Bioinformatics with Python Cookbook - Third Edition

Book description

Discover modern, next-generation sequencing libraries from the powerful Python ecosystem to perform cutting-edge research and analyze large amounts of biological data

Key Features

  • Perform complex bioinformatics analysis using the most essential Python libraries and applications
  • Implement next-generation sequencing, metagenomics, automating analysis, population genetics, and much more
  • Explore various statistical and machine learning techniques for bioinformatics data analysis

Book Description

Bioinformatics is an active research field that uses a range of simple-to-advanced computations to extract valuable information from biological data, and this book will show you how to manage these tasks using Python.

This updated third edition of the Bioinformatics with Python Cookbook begins with a quick overview of the various tools and libraries in the Python ecosystem that will help you convert, analyze, and visualize biological datasets. Next, you'll cover key techniques for next-generation sequencing, single-cell analysis, genomics, metagenomics, population genetics, phylogenetics, and proteomics with the help of real-world examples. You'll learn how to work with important pipeline systems, such as Galaxy servers and Snakemake, and understand the various modules in Python for functional and asynchronous programming. This book will also help you explore topics such as SNP discovery using statistical approaches under high-performance computing frameworks, including Dask and Spark. In addition to this, you'll explore the application of machine learning algorithms in bioinformatics.

By the end of this bioinformatics Python book, you'll be equipped with the knowledge you need to implement the latest programming techniques and frameworks, empowering you to deal with bioinformatics data on every scale.

What you will learn

  • Become well-versed with data processing libraries such as NumPy, pandas, arrow, and zarr in the context of bioinformatic analysis
  • Interact with genomic databases
  • Solve real-world problems in the fields of population genetics, phylogenetics, and proteomics
  • Build bioinformatics pipelines using a Galaxy server and Snakemake
  • Work with functools and itertools for functional programming
  • Perform parallel processing with Dask on biological data
  • Explore principal component analysis (PCA) techniques with scikit-learn

Who this book is for

This book is for bioinformatics analysts, data scientists, computational biologists, researchers, and Python developers who want to address intermediate-to-advanced biological and bioinformatics problems. Working knowledge of the Python programming language is expected. Basic knowledge of biology will also be helpful.

Table of contents

  1. Bioinformatics with Python Cookbook
Third Edition
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
    4. Download the color images
    5. Conventions used
    6. Sections
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There’s more…
      5. See also
    7. Get in touch
    8. Reviews
    9. Share Your Thoughts
  6. Chapter 1: Python and the Surrounding Software Ecology
    1. Installing the required basic software with Anaconda
      1. Getting ready
      2. How to do it...
      3. There’s more...
    2. Installing the required software with Docker
      1. Getting ready
      2. How to do it...
      3. See also
    3. Interfacing with R via rpy2
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    4. Performing R magic with Jupyter
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
  7. Chapter 2: Getting to Know NumPy, pandas, Arrow, and Matplotlib
    1. Using pandas to process vaccine-adverse events
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    2. Dealing with the pitfalls of joining pandas DataFrames
      1. Getting ready
      2. How to do it...
      3. There’s more...
    3. Reducing the memory usage of pandas DataFrames
      1. Getting ready
      2. How to do it…
      3. See also
    4. Accelerating pandas processing with 
Apache Arrow
      1. Getting ready
      2. How to do it...
      3. There’s more...
    5. Understanding NumPy as the engine behind Python data science and bioinformatics
      1. Getting ready
      2. How to do it…
      3. See also
    6. Introducing Matplotlib for chart generation
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
  8. Chapter 3: Next-Generation Sequencing
    1. Accessing GenBank and moving around NCBI databases
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    2. Performing basic sequence analysis
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    3. Working with modern sequence formats
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    4. Working with alignment data
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    5. Extracting data from VCF files
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    6. Studying genome accessibility and filtering SNP data
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    7. Processing NGS data with HTSeq
      1. Getting ready
      2. How to do it...
      3. There’s more...
  9. Chapter 4: Advanced NGS Data Processing
    1. Preparing a dataset for analysis
      1. Getting ready
      2. How to do it…
    2. Using Mendelian error information for quality control
      1. How to do it…
      2. There’s more…
    3. Exploring the data with standard statistics
      1. How to do it…
      2. There’s more…
    4. Finding genomic features from sequencing annotations
      1. How to do it…
      2. There’s more…
    5. Doing metagenomics with QIIME 2 Python API
      1. Getting ready
      2. How to do it...
      3. There’s more...
  10. Chapter 5: Working with Genomes
    1. Technical requirements
    2. Working with high-quality reference genomes
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    3. Dealing with low-quality genome references
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    4. Traversing genome annotations
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    5. Extracting genes from a reference using annotations
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    6. Finding orthologues with the Ensembl REST API
      1. Getting ready
      2. How to do it...
      3. There’s more...
    7. Retrieving gene ontology information from Ensembl
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
  11. Chapter 6: Population Genetics
    1. Managing datasets with PLINK
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    2. Using sgkit for population genetics analysis with xarray
      1. Getting ready
      2. How to do it...
      3. There’s more...
    3. Exploring a dataset with sgkit
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    4. Analyzing population structure
      1. Getting ready
      2. How to do it...
      3. See also
    5. Performing a PCA
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    6. Investigating population structure with admixture
      1. Getting ready
      2. How to do it...
      3. There’s more...
  12. Chapter 7: Phylogenetics
    1. Preparing a dataset for phylogenetic analysis
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    2. Aligning genetic and genomic data
      1. Getting ready
      2. How to do it...
    3. Comparing sequences
      1. Getting ready
      2. How to do it...
      3. There’s more...
    4. Reconstructing phylogenetic trees
      1. Getting ready
      2. How to do it...
      3. There’s more...
    5. Playing recursively with trees
      1. Getting ready
      2. How to do it...
      3. There’s more...
    6. Visualizing phylogenetic data
      1. Getting ready
      2. How to do it...
      3. There’s more...
  13. Chapter 8: Using the Protein Data Bank
    1. Finding a protein in multiple databases
      1. Getting ready
      2. How to do it...
      3. There’s more
    2. Introducing Bio.PDB
      1. Getting ready
      2. How to do it...
      3. There’s more
    3. Extracting more information from a PDB file
      1. Getting ready
      2. How to do it...
    4. Computing molecular distances on a PDB file
      1. Getting ready
      2. How to do it...
    5. Performing geometric operations
      1. Getting ready
      2. How to do it...
      3. There’s more
    6. Animating with PyMOL
      1. Getting ready
      2. How to do it...
      3. There’s more
    7. Parsing mmCIF files using Biopython
      1. Getting ready
      2. How to do it...
      3. There’s more
  14. Chapter 9: Bioinformatics Pipelines
    1. Introducing Galaxy servers
      1. Getting ready
      2. How to do it…
      3. There’s more
    2. Accessing Galaxy using the API
      1. Getting ready
      2. How to do it…
    3. Deploying a variant analysis pipeline with Snakemake
      1. Getting ready
      2. How to do it…
      3. There’s more
    4. Deploying a variant analysis pipeline with Nextflow
      1. Getting ready
      2. How to do it…
      3. There’s more
  15. Chapter 10: Machine Learning for Bioinformatics
    1. Introducing scikit-learn with a PCA example
      1. Getting ready
      2. How to do it...
      3. There’s more...
    2. Using clustering over PCA to classify samples
      1. Getting ready
      2. How to do it...
      3. There’s more...
    3. Exploring breast cancer traits using Decision Trees
      1. Getting ready
      2. How to do it...
    4. Predicting breast cancer outcomes using Random Forests
      1. Getting ready
      2. How to do it…
      3. There’s more...
  16. Chapter 11: Parallel Processing with Dask and Zarr
    1. Reading genomics data with Zarr
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    2. Parallel processing of data using Python multiprocessing
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    3. Using Dask to process genomic data based on NumPy arrays
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
    4. Scheduling tasks with dask.distributed
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also
  17. Chapter 12: Functional Programming for Bioinformatics
    1. Understanding pure functions
      1. Getting ready
      2. How to do it...
      3. There’s more...
    2. Understanding immutability
      1. Getting ready
      2. How to do it...
      3. There’s more...
    3. Avoiding mutability as a robust development pattern
      1. Getting ready
      2. How to do it...
      3. There’s more...
    4. Using lazy programming for pipelining
      1. Getting ready
      2. How to do it...
      3. There’s more...
    5. The limits of recursion with Python
      1. Getting ready
      2. How to do it...
      3. There’s more...
    6. A showcase of Python’s functools module
      1. Getting ready
      2. How to do it...
      3. There’s more...
      4. See also...
  18. Index
    1. Why subscribe?
  19. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Bioinformatics with Python Cookbook - Third Edition
  • Author(s): Tiago Antao
  • Release date: September 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803236421