Book description
Discover over 80 recipes for modeling and handling real-life biological data using modern libraries from the R ecosystem
Key Features
- Apply modern R packages to process biological data using real-world examples
- Represent biological data with advanced visualizations and workflows suitable for research and publications
- Solve real-world bioinformatics problems such as transcriptomics, genomics, and phylogenetics
- Purchase of the print or Kindle book includes a free PDF eBook
Book Description
The updated second edition of R Bioinformatics Cookbook takes a recipe-based approach to show you how to conduct practical research and analysis in computational biology with R. You’ll learn how to create a useful and modular R working environment, along with loading, cleaning, and analyzing data using the most up-to-date Bioconductor, ggplot2, and tidyverse tools.
This book will walk you through the Bioconductor tools necessary for you to understand and carry out protocols in RNA-seq and ChIP-seq, phylogenetics, genomics, gene search, gene annotation, statistical analysis, and sequence analysis. As you advance, you'll find out how to use Quarto to create data-rich reports, presentations, and websites, as well as get a clear understanding of how machine learning techniques can be applied in the bioinformatics domain. The concluding chapters will help you develop proficiency in key skills, such as gene annotation analysis and functional programming in purrr and base R. Finally, you'll discover how to use the latest AI tools, including ChatGPT, to generate, edit, and understand R code and draft workflows for complex analyses.
By the end of this book, you'll have gained a solid understanding of the skills and techniques needed to become a bioinformatics specialist and efficiently work with large and complex bioinformatics datasets.
What you will learn
- Set up a working environment for bioinformatics analysis with R
- Import, clean, and organize bioinformatics data using tidyr
- Create publication-quality plots, reports, and presentations using ggplot2 and Quarto
- Analyze RNA-seq, ChIP-seq, genomics, and next-generation genetics with Bioconductor
- Search for genes and proteins by performing phylogenetics and gene annotation
- Apply ML techniques to bioinformatics data using mlr3
- Streamline programmatic work using iterators and functional tools in the base R and purrr packages
- Use ChatGPT to create, annotate, and debug code and workflows
Who this book is for
This book is for bioinformaticians, data analysts, researchers, and R developers who want to address intermediate-to-advanced biological and bioinformatics problems by learning via a recipe-based approach. Working knowledge of the R programming language and basic knowledge of bioinformatics are prerequisites.
Table of contents
- R Bioinformatics Cookbook, Second Edition
- Contributors
- About the author
- About the reviewer
- Preface
-
Chapter 1: Setting Up Your R Bioinformatics Working Environment
- Technical requirements
- Setting up an R project in a directory
- Using the here package to simplify working with paths
- Using the devtools package to work with the latest non-CRAN packages
- Setting up your machine for the compilation of source packages
- Using the renv package to create a project-specific set of packages
- Installing and managing different versions of Bioconductor packages in environments
- Using bioconda to install external tools
-
Chapter 2: Loading, Tidying, and Cleaning Data in the tidyverse
- Technical requirements
- Loading data from files with readr
- Tidying a wide format table into a tidy table with tidyr
- Tidying a long format table into a tidy table with tidyr
- Combining tables using join functions
- Reformatting and extracting existing data into new columns using stringr
- Computing new data columns from existing ones and applying arbitrary functions using mutate()
- Using dplyr to summarize data in large tables
- Using datapasta to create R objects from cut-and-paste data
-
Chapter 3: ggplot2 and Extensions for Publication Quality Plots
- Technical requirements
- Combining many plot types in ggplot2
- Comparing changes in distributions with ggridges
- Customizing plots with ggeasy
- Highlighting selected values in busy plots with gghighlight
- Plotting variability and confidence intervals better with ggdist
- Making interactive plots with plotly
- Clarifying label placement with ggrepel
- Zooming and making callouts from selected plot sections with facetzoom
- Chapter 4: Using Quarto to Make Data-Rich Reports, Presentations, and Websites
-
Chapter 5: Easily Performing Statistical Tests Using Linear Models
- Technical requirements
- Modeling data with a linear model
- Using a linear model to compare the mean of two groups
- Using a linear model and ANOVA to compare multiple groups in a single variable
- Using linear models and ANOVA to compare multiple groups in multiple variables
- Testing and accounting for interactions between variables in linear models
- Doing tests for differences in data in two categorical variables
- Making predictions using linear models
-
Chapter 6: Performing Quantitative RNA-seq
- Technical requirements
- Estimating differential expression with edgeR
- Estimating differential expression with DESeq2
- Estimating differential expression with Kallisto and Sleuth
- Using Sleuth to analyze time course experiments
- Analyzing splice variants with SGSeq
- Performing power analysis with powsimR
- Finding unannotated transcribed regions
- Finding regions showing high expression ab initio using bumphunter
- Differential peak analysis
- Estimating batch effects with SVA
- Finding allele-specific expression with AllelicImbalance
- Presenting RNA-Seq data using ComplexHeatmap
-
Chapter 7: Finding Genetic Variants with HTS Data
- Technical requirements
- Finding SNPs and INDELs from sequence data using VariantTools
- Getting ready
- Predicting open reading frames in long reference sequences
- Plotting features on genetic maps with karyoploteR
- Selecting and classifying variants with VariantAnnotation
- Extracting information in genomic regions of interest
- Finding phenotype and genotype associations with GWAS
- Estimating the copy number at a locus of interest
-
Chapter 8: Searching Gene and Protein Sequences for Domains and Motifs
- Technical requirements
- Finding DNA motifs with universalmotif
- Finding protein domains with PFAM and bio3d
- Finding InterPro domains
- Finding transmembrane domains with tmhmm and pureseqTM
- Creating figures of protein domains using drawProteins
- Performing multiple alignments of proteins or genes
- Aligning genomic length sequences with DECIPHER
- Novel feature detection in proteins
- 3D structure protein alignment in bio3d
-
Chapter 9: Phylogenetic Analysis and Visualization
- Technical requirements
- Reading and writing varied tree formats with ape and treeio
- Visualizing trees of many genes quickly with ggtree
- Quantifying and estimating the differences between trees with treespace
- Extracting and working with subtrees using ape
- Creating dot plots for alignment visualizations
- Reconstructing trees from alignments using phangorn
- Finding orthologue candidates using reciprocal BLASTs
-
Chapter 10: Analyzing Gene Annotations
- Technical requirements
- Retrieving gene and genome annotations from BioMart
- Getting Gene Ontology information for functional analysis from appropriate databases
- Using AnnoDB packages for genome annotation
- Using ClusterProfiler for determining GO enrichment in clusters
- Finding GO enrichment in an Ontology Conditional way with topGO
- Finding enriched KEGG pathways
- Retrieving and working with SNPs
-
Chapter 11: Machine Learning with mlr3
- Technical requirements
- Defining a task and learner to implement k-nearest neighbors (k-NNs) in mlr3
- Testing the fit of the model using cross-validation
- Using logistic regression to classify the relative likelihood of two outcomes
- Classifying using random forest and interpreting it with iml
- Dimension reduction with PCA in mlr3 pipelines
- Creating a tSNE and UMAP embedding
- Clustering with k-means and hierarchical clustering
- Chapter 12: Functional Programming with purrr and base R
-
Chapter 13: Turbo-Charging Development in R with ChatGPT
- Technical requirements
- Interpreting complicated code with ChatGPT assistance
- Debugging and improving code with ChatGPT
- Generating code with ChatGPT
- Getting ready
- Writing documentation for R functions with ChatGPT
- Writing unit tests for R functions with ChatGPT
- Finding R packages to build a workflow with ChatGPT
- Index
- Other Books You May Enjoy
Product information
- Title: R Bioinformatics Cookbook - Second Edition
- Author(s):
- Release date: October 2023
- Publisher(s): Packt Publishing
- ISBN: 9781837634279
You might also like
book
R Bioinformatics Cookbook
Over 60 recipes to model and handle real-life biological data using modern libraries from the R …
book
R Cookbook, 2nd Edition
Perform data analysis with R quickly and efficiently with more than 275 practical recipes in this …
book
R Packages, 2nd Edition
Turn your R code into packages that others can easily install and use. With this fully …
book
Bayesian Analysis with Excel and R
Leverage the full power of Bayesian analysis for competitive advantage Bayesian methods can solve problems you …