Book description
Life scientists today urgently need training in bioinformatics skills. Too many bioinformatics programs are poorly written and barely maintained, usually by students and researchers who've never learned basic programming skills. This practical guide shows postdoc bioinformatics professionals and students how to exploit the best parts of Python to solve problems in biology while creating documented, tested, reproducible software.
Ken Youens-Clark, author of Tiny Python Projects (Manning), demonstrates not only how to write effective Python code but also how to use tests to write and refactor scientific programs. You'll learn the latest Python features and tools including linters, formatters, type checkers, and tests to create documented and tested programs. You'll also tackle 14 challenges in Rosalind, a problem-solving platform for learning bioinformatics and programming.
- Create command-line Python programs to document and validate parameters
- Write tests to verify refactor programs and confirm they're correct
- Address bioinformatics ideas using Python data structures and modules such as Biopython
- Create reproducible shortcuts and workflows using makefiles
- Parse essential bioinformatics file formats such as FASTA and FASTQ
- Find patterns of text using regular expressions
- Use higher-order functions in Python like filter(), map(), and reduce()
Publisher resources
Table of contents
-
Preface
- Who Should Read This?
- Programming Style: Why I Avoid OOP and Exceptions
- Structure
- Test-Driven Development
- Using the Command Line and Installing Python
- Getting the Code and Tests
- Installing Modules
- Installing the new.py Program
- Why Did I Write This Book?
- Conventions Used in This Book
- Using Code Examples
- OâReilly Online Learning
- How to Contact Us
- Acknowledgments
- I. The Rosalind.info Challenges
- 1. Tetranucleotide Frequency: Counting Things
-
2. Transcribing DNA into mRNA: Mutating Strings, Reading and Writing Files
-
Getting Started
- Defining the Programâs Parameters
- Defining an Optional Parameter
- Defining One or More Required Positional Parameters
- Using nargs to Define the Number of Arguments
- Using argparse.FileType() to Validate File Arguments
- Defining the Args Class
- Outlining the Program Using Pseudocode
- Iterating the Input Files
- Creating the Output Filenames
- Opening the Output Files
- Writing the Output Sequences
- Printing the Status Report
- Using the Test Suite
- Solutions
- Benchmarking
- Going Further
- Review
-
Getting Started
- 3. Reverse Complement of DNA: String Manipulation
- 4. Creating the Fibonacci Sequence: Writing, Testing, and Benchmarking Algorithms
-
5. Computing GC Content: Parsing FASTA and Analyzing Sequences
- Getting Started
-
Solutions
- Solution 1: Using a List
- Solution 2: Type Annotations and Unit Tests
- Solution 3: Keeping a Running Max Variable
- Solution 4: Using a List Comprehension with a Guard
- Solution 5: Using the filter() Function
- Solution 6: Using the map() Function and Summing Booleans
- Solution 7: Using Regular Expressions to Find Patterns
- Solution 8: A More Complex find_gc() Function
- Benchmarking
- Going Further
- Review
-
6. Finding the Hamming Distance: Counting Point Mutations
- Getting Started
-
Solutions
- Solution 1: Iterating and Counting
- Solution 2: Creating a Unit Test
- Solution 3: Using the zip() Function
- Solution 4: Using the zip_longest() Function
- Solution 5: Using a List Comprehension
- Solution 6: Using the filter() Function
- Solution 7: Using the map() Function with zip_longest()
- Solution 8: Using the starmap() and operator.ne() Functions
- Going Further
- Review
- 7. Translating mRNA into Protein: More Functional Programming
- 8. Find a Motif in DNA: Exploring Sequence Similarity
- 9. Overlap Graphs: Sequence Assembly Using Shared K-mers
- 10. Finding the Longest Shared Subsequence: Finding K-mers, Writing Functions, and Using Binary Search
- 11. Finding a Protein Motif: Fetching Data and Using Regular Expressions
- 12. Inferring mRNA from Protein: Products and Reductions of Lists
- 13. Location Restriction Sites: Using, Testing, and Sharing Code
- 14. Finding Open Reading Frames
- II. Other Programs
- 15. Seqmagique: Creating and Formatting Reports
- 16. FASTX grep: Creating a Utility Program to Select Sequences
- 17. DNA Synthesizer: Creating Synthetic Data with Markov Chains
- 18. FASTX Sampler: Randomly Subsampling Sequence Files
- 19. Blastomatic: Parsing Delimited Text Files
- A. Documenting Commands and Creating Workflows with make
- B. Understanding $PATH and Installing Command-Line Programs
- Epilogue
- Index
- About the Author
Product information
- Title: Mastering Python for Bioinformatics
- Author(s):
- Release date: May 2021
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098100889
You might also like
book
Bioinformatics with Python Cookbook
Learn how to use modern Python bioinformatics libraries and applications to do cutting-edge research in computational …
book
Bioinformatics with Python Cookbook - Second Edition
Discover modern, next-generation sequencing libraries from Python ecosystem to analyze large amounts of biological data Key …
book
Bioinformatics with Python Cookbook - Third Edition
Discover modern, next-generation sequencing libraries from the powerful Python ecosystem to perform cutting-edge research and analyze …
book
Machine Learning with Python Cookbook
This practical guide provides nearly 200 self-contained recipes to help you solve machine learning challenges you …