Book description
Over 60 recipes to model and handle real-life biological data using modern libraries from the R ecosystem
Key Features
- Apply modern R packages to handle biological data using real-world examples
- Represent biological data with advanced visualizations suitable for research and publications
- Handle real-world problems in bioinformatics such as next-generation sequencing, metagenomics, and automating analyses
Book Description
Handling biological data effectively requires an in-depth knowledge of machine learning techniques and computational skills, along with an understanding of how to use tools such as edgeR and DESeq. With the R Bioinformatics Cookbook, you’ll explore all this and more, tackling common and not-so-common challenges in the bioinformatics domain using real-world examples.
This book will use a recipe-based approach to show you how to perform practical research and analysis in computational biology with R. You will learn how to effectively analyze your data with the latest tools in Bioconductor, ggplot, and tidyverse. The book will guide you through the essential tools in Bioconductor to help you understand and carry out protocols in RNAseq, phylogenetics, genomics, and sequence analysis. As you progress, you will get up to speed with how machine learning techniques can be used in the bioinformatics domain. You will gradually develop key computational skills such as creating reusable workflows in R Markdown and packages for code reuse.
By the end of this book, you’ll have gained a solid understanding of the most important and widely used techniques in bioinformatic analysis and the tools you need to work with real biological data.
What you will learn
- Employ Bioconductor to determine differential expressions in RNAseq data
- Run SAMtools and develop pipelines to find single nucleotide polymorphisms (SNPs) and Indels
- Use ggplot to create and annotate a range of visualizations
- Query external databases with Ensembl to find functional genomics information
- Execute large-scale multiple sequence alignment with DECIPHER to perform comparative genomics
- Use d3.js and Plotly to create dynamic and interactive web graphics
- Use k-nearest neighbors, support vector machines and random forests to find groups and classify data
Who this book is for
This book is for bioinformaticians, data analysts, researchers, and R developers who want to address intermediate-to-advanced biological and bioinformatics problems by learning through a recipe-based approach. Working knowledge of R programming language and basic knowledge of bioinformatics are prerequisites.
Table of contents
- Title Page
- Copyright and Credits
- About Packt
- Contributors
- Preface
-
Performing Quantitative RNAseq
- Technical requirements
- Estimating differential expression with edgeR
- Estimating differential expression with DESeq2
- Power analysis with powsimR
- Finding unannotated transcribed regions
- Finding regions showing high expression ab initio with bumphunter
- Differential peak analysis
- Estimating batch effects using SVA
- Finding allele-specific expressions with AllelicImbalance
- Plotting and presenting RNAseq data
-
Finding Genetic Variants with HTS Data
- Technical requirements
- Finding SNPs and indels from sequence data using VariantTools
- Predicting open reading frames in long reference sequences
- Plotting features on genetic maps with karyoploteR
- Selecting and classifying variants with VariantAnnotation
- Extracting information in genomic regions of interest
- Finding phenotype and genotype associations with GWAS
- Estimating the copy number at a locus of interest
-
Searching Genes and Proteins for Domains and Motifs
- Technical requirements
- Finding DNA motifs with universalmotif
- Finding protein domains with PFAM and bio3d
- Finding InterPro domains
- Performing multiple alignments of genes or proteins
- Aligning genomic length sequences with DECIPHER
- Machine learning for novel feature detection in proteins
- 3D structure protein alignment with bio3d
-
Phylogenetic Analysis and Visualization
- Technical requirements
- Reading and writing varied tree formats with ape and treeio
- Visualizing trees of many genes quickly with ggtree
- Quantifying differences between trees with treespace
- Extracting and working with subtrees using ape
- Creating dot plots for alignment visualization
- Reconstructing trees from alignments using phangorn
-
Metagenomics
- Technical requirements
- Loading in hierarchical taxonomic data using phyloseq
- Rarefying counts and correcting for sample differences using metacoder
- Reading amplicon data from raw reads with dada2
- Visualizing taxonomic abundances with heat trees in metacoder
- Computing sample diversity with vegan
- Splitting sequence files into OTUs
-
Proteomics from Spectrum to Annotation
- Technical requirements
- Representing raw MS data visually
- Viewing proteomics data in a genome browser
- Visualizing distributions of peptide hit counts to find thresholds
- Converting MS formats to move data between tools
- Matching spectra to peptides for verification with protViz
- Applying quality control filters to spectra
- Identifying genomic loci that match peptides
-
Producing Publication and Web-Ready Visualizations
- Technical requirements
- Visualizing multiple distributions with ridgeplots
- Creating colormaps for two-variable data
- Representing relational data as networks
- Creating interactive web graphics with plotly
- Constructing three-dimensional plots with plotly
- Constructing circular genome plots of polyomic data
-
Working with Databases and Remote Data Sources
- Technical requirements
- Retrieving gene and genome annotation from BioMart
- Retrieving and working with SNPs
- Getting gene ontology information
- Finding experiments and reads from SRA/ENA
- Performing quality control and filtering on high-throughput sequence reads
- Completing read-to-reference alignment with external programs
- Visualizing the quality control of read-to-reference alignments
-
Useful Statistical and Machine Learning Methods
- Technical requirements
- Correcting p-values to account for multiple hypotheses
- Generating a simulated dataset to represent a background
- Learning groupings within data and classifying with kNN
- Predicting classes with random forests
- Predicting classes with SVM
- Learning groups in data without prior information
- Identifying the most important variables in data with random forests
- Identifying the most important variables in data with PCA
- Programming with Tidyverse and Bioconductor
-
Building Objects and Packages for Code Reuse
- Technical requirements
- Creating simple S3 objects to simplify code
- Taking advantage of generic object functions with S3 classes
- Creating structured and formal objects with the S4 system
- Simple ways to package code for sharing and reuse
- Using devtools to host code from GitHub
- Building a unit test suite to ensure that functions work as you intend
- Using continuous integration with Travis to keep code tested and up to date
- Other Books You May Enjoy
Product information
- Title: R Bioinformatics Cookbook
- Author(s):
- Release date: October 2019
- Publisher(s): Packt Publishing
- ISBN: 9781789950694
You might also like
book
R Bioinformatics Cookbook - Second Edition
Discover over 80 recipes for modeling and handling real-life biological data using modern libraries from the …
book
R Statistics Cookbook
Solve real-world statistical problems using the most popular R packages and techniques Key Features Learn how …
book
R Programming for Bioinformatics
Due to its data handling and modeling capabilities as well as its flexibility, R is becoming …
book
R Cookbook, 2nd Edition
Perform data analysis with R quickly and efficiently with more than 275 practical recipes in this …