Chapter 14. Making a Fully Reproducible Paper

Throughout this book, you’ve been learning how to use an array of individual tools and components to perform specific tasks.  You now know everything you need to get work done—in small pieces. In this final chapter, we walk you through a case study that demonstrates how to bring all of the pieces together into an end-to-end analysis, with the added bonus of showcasing methods for ensuring full computational reproducibility.

The challenge posed in the case study is to reproduce a published analysis in which researchers identified the contribution of a particular gene to the risk for a form of congenital heart disease. The original study was performed on controlled-access data, so the first part of the challenge is to generate a synthetic dataset that can be substituted for the original. Then, we must re-create the data processing and analysis, which include variant discovery, effect prediction, prioritization, and clustering based on the information provided in the paper. Finally, we must apply these methods to the synthetic dataset and evaluate whether we can successfully reproduce the original result. In the course of the chapter, we derive lessons from the challenges we face that should guide you in your efforts to make your own work computationally reproducible.

Overview of the Case Study

We originally conceived this case study as a basis for a couple of workshops that we had proposed to deliver at conferences, starting with the ...

Get Genomics in the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.