Book description
Sequence similarity is a powerful tool for discovering biological function. Just as the ancient Greeks used comparative anatomy to understand the human body and linguists used the Rosetta stone to decipher Egyptian hieroglyphs, today we can use comparative sequence analysis to understand genomes. BLAST (Basic Local Alignment Search Tool), is a sophisticated software package for rapid searching of nucleotide and protein databases. It is one of the most important software packages used in sequence analysis and bioinformatics. Most users of BLAST, however, seldom move beyond the program's default parameters, and never take advantage of its full power. BLAST is the only book completely devoted to this popular suite of tools. It offers biologists, computational biology students, and bioinformatics professionals a clear understanding of BLAST as well as the science it supports. This book shows you how to move beyond the default parameters, get specific answers using BLAST, and how to interpret your results. The book also contains tutorial and reference sections covering NCBI-BLAST and WU-BLAST, background material to help you understand the statistics behind BLAST, Perl scripts to help you prepare your data and analyze your results, and a wealth of tips and tricks for configuring BLAST to meet your own research needs. Some of the topics covered include:
BLAST basics and the NCBI web interface
How to select appropriate search parameters
BLAST programs: BLASTN, BLASTP, BLASTX, TBLASTN, TBLASTX, PHI-BLAST, and PSI BLAST
Detailed BLAST references, including NCBI-BLAST and WU-BLAST
Understanding biological sequences
Sequence similarity, homology, scoring matrices, scores, and evolution
Sequence Alignment
Calculating BLAST statistics
Industrial-strength BLAST, including developing applications with Perl and BLAST
BLAST is the only comprehensive reference with detailed, accurate information on optimizing BLAST searches for high-throughput sequence analysis. This is a book that any biologist should own.
Table of contents
- Table of Contents (1/2)
- Table of Contents (2/2)
- Foreword
- Preface
- Part I
-
Part II
- Biological Sequences
- Sequence Alignment
- Sequence Similarity
-
Part III
- BLAST
- Anatomy of a BLAST Report
-
A BLAST Statistics Tutorial
-
Basic BLAST Statistics
- Actual Versus Effective Lengths
- The Raw Score and Bit Score
- The Expect of an HSP
- The WU-BLAST P-Value
- Sum Statistics
- An Expect(n) Means That Sum Statistics Were Applied
- Sum Statistics Are Pair-Wise in Their Focus
- The Sum Score
- Effective Length of a BLASTX Query
- Calculating a Sum Score
- Calculating the Pair-Wise Sum P-Value
- Correcting for Multiple Tests
- Correcting for Database Size
- Frame- and Size-Corrected Expects
- Using Statistics to Understand BLAST Results
- Where Did My Oligo Go?
-
Basic BLAST Statistics
-
20 Tips to Improve YourBLASTSearches
- 8.1 Don’t Use the Default Parameters
- 8.2 Treat BLAST Searches as Scientific Experiments
- 8.3 Perform Controls, Especially in theTwilightZone
- 8.4 View BLAST Reports Graphically
- 8.5 Use the Karlin-Altschul Equation toDesignExperiments
- 8.6 When Troubleshooting, Read the Footer First
- 8.7 Know When to Use Complexity Filters
- 8.8 Mask Repeats in Genomic DNA
- 8.9 Segment Large Genomic Sequences
- 8.10 Be Skeptical of Hypothetical Proteins
- 8.11 Expect Contaminants in EST Databases
- 8.12 Use Caution When Searching Raw Sequencing Reads
- 8.13 Look for Stop Codons and Frame-Shifts to find Pseudo-Genes
- 8.14 Consider Using Ungapped Alignment for BLASTX, TBLASTN, and TBLASTX
- 8.15 Look for Gaps in Coverage as a Sign ofMissedExons
- 8.16 Parse BLAST Reports with Bioperl
- 8.17 Perform Pilot Experiments
- 8.18 Examine Statistical Outliers
- 8.19 Use links and topcomboN to Make Sense of Alignment Groups
- 8.20 How to Lie with BLAST Statistics
-
BLAST Protocols
- BLASTN Protocols (1/3)
- BLASTN Protocols (2/3)
- BLASTN Protocols (3/3)
- BLASTP Protocols
- BLASTX Protocols
- TBLASTN Protocols
- TBLASTX Protocols
-
Part IV
- Installation and Command-Line Tutorial
- BLAST Databases
- Hardware and Software Optimizations
-
Part V
-
NCBI-BLAST Reference
- Usage Statements
- Command-Line Syntax
- blastall Parameters (1/2)
-
blastall Parameters (2/2)
- -a [integer]
- -A [integer]
- -b [integer]
- -B [integer]
- -d [database]
- -D [1..23]
- -e [real number]
- -E [integer]
- -f [integer]
- -F [T/F], -F [string]
- -g [T/F]
- -G [integer]
- -i [input file]
- -I [T/F]
- -J [T/F]
- -K [integer]
- -l [file]
- -L [string]
- -m [0..11]
- -M [matrix file]
- -n [T/F]
- -o [output file]
- -p [program name]
- -P [0/1]
- -q [negative integer]
- -Q [1..23]
- -r [integer]
- -R [checkpoint file]
- -S [1..3]
- -t [integer]
- -T [T/F]
- -v [integer]
- -w [integer]
- -W [integer]
- -X [integer]
- -y [integer]
- -Y [real number]
- -z [real number]
- -Z [integer]
- formatdb Parameters
- fastacmd Parameters
- megablast Parameters (1/2)
-
megablast Parameters (2/2)
- -a [integer]
- -A [integer]
- -b [integer]
- -d [string]
- -D [0..3]
- -e [real number]
- -E [integer]
- -f [T/F]
- -F [T/F] [string]
- -G [integer]
- -H [integer]
- -i [file]
- -I [T/F]
- -l [file]
- -L [string]
- -m [0..11]
- -M [integer]
- -n [T/F]
- -N [0,1,2]
- -o [file]
- -p [real number]
- -P [integer]
- -q [negative integer]
- -Q [file]
- -r [integer]
- -R [T/F]
- -s [integer]
- -S [0..3]
- -t [16,18,21]
- -T [T/F]
- -U [T/F]
- -v [integer]
- -W [integer]
- -X [integer]
- -y [integer]
- -z [real number]
- -Z [integer]
-
bl2seq Parameters
- -a [file]
- -A [T/F]
- -d [real number]
- -D [0/1]
- -e [real number]
- -E [integer]
- -F [T/F] [string]
- -g [T/F]
- -G [integer]
- -i [file]
- -I [integer],[integer]
- -j [file]
- -J [integer],[integer]
- -m [T/F]
- -M [string]
- -o [file]
- -p [string]
- -q [negative integer]
- -r [integer]
- -S [1..3]
- -t [integer]
- -T [T/F]
- -U [T/F]
- -W [integer]
- -X [integer]
- -Y [real number]
- blastpgp Parameters (PSI-BLAST andPHIBLAST) (1/2)
-
blastpgp Parameters (PSI-BLAST andPHIBLAST) (2/2)
- PSI-BLAST
-
PHI-BLAST
- -a [integer]
- -A [integer]
- -b [integer]
- -B [file]
- -c [integer]
- -C [file]
- -d [string]
- -e [real]
- -E [integer]
- -f [integer]
- -F [string]
- -g [T/F]
- -G [integer]
- -h [real number]
- -H [integer]
- -i [file]
- -I [T/F]
- -j [integer]
- -J [T/F]
- -k [file]
- -K [integer]
- -l [string]
- -L [integer]
- -m [0..9]
- -M [string]
- -N [real number]
- -o [file]
- -O [file]
- -p [string]
- -Q [file]
- -R [file]
- -s [T/F]
- -S [integer]
- -t [T/F]
- -T [T/F]
- -U [T/F]
- -v [integer]
- -W [1..3]
- -X [integer]
- -y [real number]
- -Y [real number]
- -z [real number]
- -Z [integer]
- blastclust Parameters
-
WU-BLAST Reference
- Usage Statements
- Command-Line Syntax
- WU-BLAST Parameters (1/3)
- WU-BLAST Parameters (2/3)
-
WU-BLAST Parameters (3/3)
- altscore=[string]
- B=[integer]
- bottom
- cpus=[integer]
- dbrecmax=[integer]
- dbrecmin=[integer]
- E=[number]
- E2=[number]
- echofilter
- errors
- filter=[string]
- gapE2=[number]
- gapH=[number]
- gapK=[number]
- gapL=[number]
- gapS2=[integer]
- gapsepqmax=[int]
- gapsepsmax=[int]
- gapX
- gi
- golf=[number]
- golmax=[integer]
- gspmax=[integer]
- H=[number]
- hspmax=[integer]
- hitdist=[integer]
- hspsepqmax=[int]
- hspsepsmax=[int]
- K=[number]
- kap
- L=[number]
- lcfilter
- lcmask
- links
- M=[integer]
- maskextra=[integer]
- matrix=[file]
- N=[integer]
- nogap
- nonnegok
- nosegs
- notes
- novalidctxok
- nwlen=[integer]
- nwstart=[integer]
- o=[file]
- olf=[number]
- olmax=[integer]
- postsw
- Q=[integer]
- qoffset=[integer]
- qrecmax=[integer]
- Qrecmin=[integer]
- R=[integer]
- restest
- S=[integer]
- mS2=[integer]
- seqtest
- span, span1, span2
- T=[integer]
- top
- topcomboN=[integer]
- V=[integer]
- warnings
- wink=[integer]
- wordmask=[method]
- W=[integer]
- X=[integer]
- Y=[number]
- Z=[number]
- xdformat Parameters
- xdget Parameters
-
NCBI-BLAST Reference
-
Part VI
-
NCBI Display Formats
- Brief Descriptions
-
Detailed Descriptions and Examples
- Option 0: Pairwise Alignments
- Query-Anchored Alignments
- Option 1: Query-Anchored Showing Identities
- Option 2: Query-Anchored, No Identities
- Option 3: Flat Query-Anchored Showing Identities
- Option 4: Flat Query-Anchored, No Identities
- Option 5: Query-Anchored, No Identities, and Blunt Ends
- Option 6: Flat Query-Anchored, No Identities, and Blunt Ends
- Option 7: XML
- Option 8: Tabular, Without Comment Lines
- Option 9: Tabular, with Comment Lines
- Option 10: ASN.1 Text Format
- Option 11: ASN.1 Binary Format
- Nucleotide Scoring Schemes
- NCBI-BLAST Scoring Schemes
- blast-imager.pl
- blast2table.pl
-
NCBI Display Formats
- Glossary (1/2)
- Glossary (2/2)
- Index (1/5)
- Index (2/5)
- Index (3/5)
- Index (4/5)
- Index (5/5)
Product information
- Title: BLAST
- Author(s):
- Release date: July 2003
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9780596002992
You might also like
book
INSPIRED, 2nd Edition
Learn to design, build, and scale products consumers can’t get enough of How do today’s most …
audiobook
The Engineering Executive's Primer
As an engineering manager, you almost always have someone in your company to turn to for …
book
AI at the Edge
Edge AI is transforming the way computers interact with the real world, allowing IoT devices to …
audiobook
What Is Generative AI? (Audio)
ChatGPT, Midjourney, Stable Diffusion, and LLaMA are quickly becoming household names. These tools and many more …