Chapter 5. First Steps with GATK

In Chapter 4, we set you up to do work in the cloud. This chapter is all about getting you oriented and comfortable with the GATK and related tools. We begin by covering the basics, including computational requirements, command-line syntax, and common options. We show you how to spin up the GATK Docker container on the GCP VM you set up in Chapter 4 so that you can run real commands at scale with minimal effort. Then we work through a simple example of variant calling with the most widely used GATK tool, HaplotypeCaller. We explore some basic filtering mechanisms to give you a feel for working with variant calls and their context annotations, which play an important role in filtering results. Finally, we introduce the real-world GATK Best Practices workflows, which are guidelines for getting the best results possible out of your variant discovery analyses.

Getting Started with GATK

GATK is an open source software package developed at the Broad Institute. As its full name suggests, GATK is a toolkit, not a single tool—where we define tool as the individual functional unit that you will invoke by name to perform a particular data transformation or analysis task. GATK contains a fairly large collection of these individual tools, some designed to convert data from one format to another, to collect metrics about the data or, most notably, to run actual computational analyses on the data. All of these tools are provided within a single packaged executable. ...

Get Genomics in the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.