Errata

BLAST

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed Page APPENDIX D
blast-imager.pl

This is not an error reporting.
I ported blast-imager.pl to python language and I would like to distribute the python version on my Github. Because the original perl code is from this book and my python code is basically a translation of the original perl code. May I put the python version of the code into public directory (Github) and distribute it? What kind of license or statement is suitable to be attached if I could do it? Please advise.

mitsuharu sato  Jul 26, 2022 
Printed Page 33
4th paragraph

"gIn terms of genomes [...]" should be "In terms of genomes [...]"

Anonymous  Jan 18, 2013 
Printed Page 47
Figure 3.6

Local Alignment: Smith-Waterman

Error: Running Blast_example3-2.pl returns a different alignment matrix to the one
displayed in fig 3.6. Iterating through the score and pointer data returns the
following matrices:

>perl Blast_example3-2.pl COELACANTH PELICAN

1: 0 0 0 0 0 0 0 0 0 0
2: 0 0 1 0 0 0 0 0 0 0
3: 0 0 0 2 1 0 0 0 0 0
4: 0 0 0 1 1 0 0 0 0 0
5: 1 0 0 0 0 2 1 0 0 0
6: 0 0 0 0 1 1 3 2 1 0
7: 0 0 0 0 0 0 2 4 3 2
End Of Score Matrix..

1:0 0 0 0 0 0 0 0 0 0
2:0 0 0 0 0 0 0 0 0
3:0 0 0 - 0 0 0 0 0
4:0 0 0 | 0 0 0 0 0
5: 0 0 0 0 - 0 0 0
6:0 0 0 0 | - - 0
7:0 0 0 0 0 0 | - -
End Of Direction Matrix..

# The result is still the same
ELACAN
ELICAN

Anonymous   
Printed Page 50
second paragraph`last sentence

O(n2) should be O(n^2) or "order n-squared"

Anonymous   
Printed Page 56
top

equations for H are missing 1/p

Anonymous   
Printed Page 58
equation 4-3

I think there should be parentheses around p_i*p_j so that the equation reads log(q_ij/(p_i*p_j))

Without them the equation as it stands is technically

log((q_ij/p_i)*p_j)

Anonymous   
Printed Page 58
protein subsequences

The final protein segment (CYB_TRYBB/253 according to Bejerano's 2003 thesis "Automata Learning & Stochastic Modeling for Biosequence Analysis", the last segment listed on p. 58) is confusing me. How is it similar to the other segments shown on this page?

The PEW triplet is aligned, but beyond that, I'm having a hard time identifying other similarities.

As far as alignment goes, its Levenshtein distance is 35 substitutions from CYBA_STELO/258 (6th segment on p. 58) for only 51 positions. On the other hand, if you use the segments from PETD_SYNP2/65 (1st segment) and PETD_CHLEU/65 (2nd segment), the Levenshtein distance is only 13 substitutions.

Summing instances of amino acids also shows that CYB_TRYBB/253 is an outlier. For instance, Phenylalanine (F) appears 3 or 4 times for all of the other segments, but 9 times for CYB_TRYBB/253.

Why is this segment included? Is the PEW triplet similarity enough to qualify the entire segment as similar? Is it weighted? Are there other factors that make this segment similar?

I think the answer and a bit more context will help me understand why we need the lod score from Dayhoff. I am pre-grok.

Thanks!

Kim Raymoure  Jan 18, 2013 
Printed Page 59
Fig. 4-2

Matrix is not 20 x 20. Valine is left out of rows. No way to finid out value of V to V.

Anonymous   
Printed Page 59
Figure 4-2. Blosum62 scoring matrix

Missing last column coresponding to V aminoacid.

Anonymous   
Printed Page 61
Equation 4-4

The Sum Sum_{i=1..n}Sum_{j=1..i} q_ij is not the sum of ALL frequencies it is only
the sum of the lower diagonal of the q_ij.
For example if n=4, then q_12 is not a member of the sum in the formula.

Anonymous   
Printed Page 62
Equations 4-7, 4-8, 4-9

The j index should go from 1 to n, not from 1 to i. You want to sum ALL
scores, not just the lower diagonal ones. The example 4-1 in page 63 confirms this. If
you only sum the lower diagonal then $expected_socre = $match * 0.25 + $mismatch *
0.75 / 2.

AUTHOR: The equations are correct as they stand. The reason is that
in every case the value of xij is the same as xji. If you take both
halves of the diagonal then you are going to be off by a factor
of two.

Anonymous   
Printed Page 62
Equation 4-9

According to Henikoff, S. and Henikoff J.G. 1992(http://www.pnas.org/cgi/reprint/89/22/10915.pdf), "H" should correspond to a measure of mutual information or relative entropy. It appears to me that there should not be a minus sign in front of right side summation. A relevant question is that "H" in this text may not necessarily be positive. In exceptional cases where there is no dependence between two matrices, "H" can be zero.

I think that relative entropy may not be the most suitable term to describe this equation, albeit being used in so many references about scoring sequence alignments. Relative entropy should be considered as a measure of the difference between two probability distributions, where mutual information, as expressed in the form of Equation 4-9, can be intuitively taken as a measure of dependence between two random variables. Scoring sequence alignments appears to correspond to the latter situation.

Anonymous   
Printed Page 62
Equation 4-9

I have the same question regarding this erreta report:
--
According to Henikoff, S. and Henikoff J.G. 1992(http://www.pnas.org/cgi/reprint/89/22/10915.pdf), "H" should correspond to a measure of mutual information or relative entropy. It appears to me that there should not be a minus sign in front of right side summation. A relevant question is that "H" in this text may not necessarily be positive. In exceptional cases where there is no dependence between two matrices, "H" can be zero.

I think that relative entropy may not be the most suitable term to describe this equation, albeit being used in so many references about scoring sequence alignments. Relative entropy should be considered as a measure of the difference between two probability distributions, where mutual information, as expressed in the form of Equation 4-9, can be intuitively taken as a measure of dependence between two random variables. Scoring sequence alignments appears to correspond to the latter situation.

--
My prime concern is still the dubious negative sign of equation 4-9. Since the Example 4-1 on page 63 calculates relative entropy without the negative sign either.

Anonymous  Mar 27, 2010 
Printed Page 103
Figure 7-2

I have a question about the Sum score formula.

The summation runs from i = 1 to r. However, the individual S_i are not used in the summation. Only the constant S_r is used. i is not referenced anywhere in the summation.

Is this correct?

Anonymous   
Printed Page 112
Code listing at bottom of page, 3rd paragraph

In the Perl Code listing, variable $n may be incorrectly commented:

'#actual length of query'

I think $n should be:

'# actual number of letters in database'

Anonymous   
Printed Page 170
2nd paragraph, right below "Command-Line Tutorial"

The last part of the URL provided for the examples has "BLAST" in all upper case.

Turns out the URL is case sensitive, and this should be all lower case,
i.e., "9780596002992" to avoid the "404 Not Found" error.

Anonymous   
Printed Page 173
10.3.1.5. 9780596002992x

9780596002992all -p blastx -d globins -i fugu_genomic > ncbi-blastx_test

should be:

9780596002992all -p blastx -d globins -i fugu_globin > ncbi-blastx_test

Anonymous   
Printed Page 215
second paragraph for CPUs and Computer Architecture

"Two benchmarks are provided Table 12-3." should be "Two benchmarks are provided Table 12-2."

Jamming  Oct 16, 2009 
Printed Page 264
blastclust parameters

The reference for the important parameter "-S" is missing.
Score coverage threshold.

Jamming  Oct 19, 2009 
Printed Page 310
line starting '$expect = "1$expect' ...

this line should be change FROM:
$expect = "1$expect" if $expect = ~/^e/;

... TO:
$expect = "1$expect" if $expect =~ /^e/;

the printed version will not correctly use your desired minimum expect value to filter the table entries.

Anonymous