Chapter 3. Computing Technology Basics for Life Scientists

In an ideal world, you wouldn’t need to worry too much about computing infrastructure when you’re pursuing your research. In fact, in later chapters we introduce you to systems that are specifically designed to abstract away the nitty-gritty of computing infrastructure in order to help you focus on your science. However, you will find that a certain amount of terminology and concepts are unavoidable in the real world. Investing some effort into learning them will help you to plan and execute your work more efficiently, address performance challenges, and achieve larger scale with less effort. In this chapter, we review the essential components that form the most common types of computing infrastructure, and we discuss how their strengths and limitations inform our strategies for getting work done efficiently at scale. We also go over key concepts such as parallel computing and pipelining, which are essential in genomics because of the need for automation and reproducibility. Finally, we introduce virtualization and lay out the case for cloud infrastructure.

The first few sections in this chapter are aimed at readers who have not had much training, if any, in informatics, programming, or systems administration. If you are a computational scientist or an IT professional, feel free to skip ahead until you encounter something that you don’t already know. The last two sections, which together cover pipelining, virtualization, ...

Get Genomics in the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Genomics in the Cloud by Geraldine A. Van der Auwera, Brian D. O'Connor

Chapter 3. Computing Technology Basics for Life Scientists

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly