Genomics in the AWS Cloud

Book description

Perform genome analysis and sequencing of data with Amazon Web Services

Genomics in the AWS Cloud: Analyzing Genetic Code Using Amazon Web Services enables a person who has moderate familiarity with AWS Cloud to perform full genome analysis and research. Using the information in this book, you'll be able to take a FASTQ file containing raw data from a lab or a BAM file from a service provider and perform genome analysis on it. You'll also be able to identify potentially pathogenic gene sequences.

  • Get an introduction to Whole Genome Sequencing (WGS)
  • Make sense of WGS on AWS
  • Master AWS services for genome analysis

Some key advantages of using AWS for genomic analysis is to help researchers utilize a wide choice of compute services that can process diverse datasets in analysis pipelines. Genomic sequencers that generate raw data files are located in labs on premises and AWS provides solutions to make it easy for customers to transfer these files to AWS reliably and securely. Storing Genomics and Medical (e.g., imaging) data at different stages requires enormous storage in a cost-effective manner. Amazon Simple Storage Service (Amazon S3), Amazon Glacier, and Amazon Elastics Block Store (Amazon EBS) provide the necessary solutions to securely store, manage, and scale genomic file storage. Moreover, the storage services can interface with various compute services from AWS to process these files.

Whether you're just getting started or have already been analyzing genomics data using the AWS Cloud, this book provides you with the information you need in order to use AWS services and features in the ways that will make the most sense for your genomic research.

Table of contents

  1. Cover
  2. Title Page
  3. Introduction
    1. Who Should Read This Book
    2. Genomics
    3. Cloud Computing and AWS
    4. What You'll Learn from This Book
    5. Our Story
    6. Getting Under Way
  4. CHAPTER 1: Why Do Genome Analysis Yourself  When Commercial Offerings Exist?
    1. Commercial Sequencing Services
    2. Typical Results
    3. Summary
  5. CHAPTER 2: A Crash Course in Molecular Biology
    1. DNA
    2. DNA at Work: RNA and Proteins
    3. Inheritance
    4. Summary
  6. CHAPTER 3: Obtaining Your Genome
    1. Preparing to Have Your Genome Sequenced
    2. Specifying Lab Work
    3. Engaging a Laboratory
    4. Getting a Tissue Sample for DNA Extraction
    5. Shipping the Sample
    6. Receiving the Results
    7. Summary
  7. CHAPTER 4: The Bioinformatics Workflow
    1. Extraction of DNA
    2. FASTA Files
    3. FASTQ Files
    4. Alignment to a Reference Genome
    5. Reference Genomes
    6. Quality Control
    7. Trimming
    8. The Alignment Process
    9. Marking Duplicates
    10. Recalibrating Base Quality Score
    11. Calling SNVs and Indel Variants
    12. Annotating SNVs and Indel Variants
    13. Prioritizing Variants
    14. Inheritance Analysis
    15. Identifying SVs and CNVs
    16. Bioinformatics Workflow
    17. Summary
  8. CHAPTER 5: AWS Services for Genome Analysis
    1. General Concepts
    2. Custom Environments
    3. Summary
  9. CHAPTER 6: Building Your Environment in the AWS Cloud
    1. Setting Up a Virtual Private Cloud
    2. Setting Up and Launching an EC2 Instance
    3. Setting Up S3 Buckets
    4. Configuring Your Account Securely
    5. Creating Groups
    6. Creating Users
    7. Setting Up Your Client Environment
    8. Summary
  10. CHAPTER 7: Linux and AWS Command-Line Basics for Genomics
    1. Selecting a Linux Distribution
    2. Accessing Your AWS Linux Instance from Your Local Computer
    3. Getting Familiar with the Command Line
    4. Transferring Files to and from Your AWS Instance
    5. Running Programs in the Background
    6. Understanding File Permissions
    7. Compressing and Archiving Files
    8. Managing Linux
    9. The AWS Command-Line Interface
    10. AWS CLI Essentials
    11. An Alternative Approach: AWS Systems Manager
    12. Summary
  11. CHAPTER 8: Processing theSequencing Data
    1. Getting from Data to Information
    2. Setting Up AWS Services and Data Storage
    3. Summary
  12. CHAPTER 9: Visualizing the Genome
    1. Introducing Genome Visualizers
    2. Installing the IGV Desktop Visualizer
    3. Analyzing Variants in IGV
    4. Summary
  13. CHAPTER 10: Containerizing Your Workflow on the Desktop
    1. Introducing Containerization
    2. Understanding and Using Docker
    3. Summary
  14. CHAPTER 11: Variants and Applications
    1. Polygenic Risk Scores
    2. Metagenomics
    3. AlphaFold
    4. Summary
  15. CHAPTER 12: Cancer Genomics
    1. Somatic Genomes
    2. Cancer
    3. The Promise and Reality of Cancer Precision Medicine
    4. Samples
    5. Somatic Variant Analysis
    6. Copy Number Changes
    7. Measuring Tumor Genomic Instability
    8. Summary
    9. Notes
  16. Index
  17. Copyright
  18. Dedication
  19. Acknowledgments
  20. About the Authors
  21. End User License Agreement

Product information

  • Title: Genomics in the AWS Cloud
  • Author(s): David Wall, Catherine Vacher
  • Release date: May 2023
  • Publisher(s): Wiley
  • ISBN: 9781119573371