Chapter 7. Unix Data Tools
We often forget how science and engineering function. Ideas come from previous exploration more often than from lightning strokes.
John W. Tukey
In Chapter 3, we learned the basics of the Unix shell: using streams, redirecting output, pipes, and working with processes. These core concepts not only allow us to use the shell to run command-line bioinformatics tools, but to leverage Unix as a modular work environment for working with bioinformatics data. In this chapter, weâll see how we can combine the Unix shell with command-line data tools to explore and manipulate data quickly.
Unix Data Tools and the Unix One-Liner Approach: Lessons from Programming Pearls
Understanding how to use Unix data tools in bioinformatics isnât only about learning what each tool does, itâs about mastering the practice of connecting tools togetherâcreating programs from Unix pipelines. By connecting data tools together with pipes, we can construct programs that parse, manipulate, and summarize data. Unix pipelines can be developed in shell scripts or as âone-linersââtiny programs built by connecting Unix tools with pipes directly on the shell. Whether in a script or as a one-liner, building more complex programs from small, modular tools capitalizes on the design and philosophy of Unix (discussed in âWhy Do We Use Unix in Bioinformatics? Modularity and the Unix Philosophyâ). The pipeline approach to building programs is a well-established tradition in ...
Get Bioinformatics Data Skills now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.