Errata

Baseball Hacks

Errata for Baseball Hacks

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Printed
Page xvi

Please add the following on page xvi, under "Conventions used in this book":

Throughout this book, we use "~ %" or "H:>" as prompts to indicate
commands that you type on the command line, depending on whether we
used a Windows or a Unix-type system. Most commands shown in this book
will work with either system; just type whatever they are after the prompt.

Anonymous   
Printed
Page 1
Buy and install WinZIP from http://www.winzip.com/downwzeval.htm

Anonymous   
Printed
Page 2
Get the WinZIP download enhancements from http://www.winzip.com/downcl.htm

Anonymous   
Printed
Page 3
In lines 7 & 8, substitute the command "type" for "cat"

Anonymous   
Printed
Page 4
Change line 18 from

print `unzip -qq -o $archive`;
to
print `wzunzip $archive`;
This only works correctly with licensed copies of winzip, so pay for your copy.

If you don't want to pay for Winzip, try http://www.7-zip.org/, and change line 18 from
print `unzip -qq -o $archive`;
to
print `7z x $archive`;

Anonymous   
Printed
Page 5
change line 33 from

print 'rm $file';
to
print 'del $file';

Anonymous   
Printed
Page 13
in Hack #2

"The box score tells you that Bucky had a pretty
good game, scoring three of the Yankees' five runs."

CHANGE TO:

"The box score tells you that Bucky had a pretty good
game, batting in three of the Yankees' five runs."

Anonymous   
Printed
Page 24 & 25

Bottom of page 24: "HR7/L.1-H;2-H;3-H
Grand slam home run to center field."

This should read:
"Inside the park, grand slam home run on a line drive to left field."

Top of page 25: "54/SAC/BG.2-3;1-2
<snip> Runners on first and third advance."

This should read:
"Runners on first and second advance."

Anonymous   
Printed
Page 30
Last paragraph of Step 5

First sentence currently reads "There are a lot of reasons why a pitcher might shake off a batter."

In context, it appears that it should read "There are a lot of reasons why a pitcher might shake off a
'catcher/catcher's signs'."

Note from the Author or Editor:
yup, correct

Anonymous   
Printed
Page 34
4th para

Conquery link no longer works from baseball prospectus site. It appears as though that search tool may
have been removed.

Note from the Author or Editor:
Probably correct. Web sites are always changing. This book isn't very old, but a lot has still changed in the last 4 years...

Anonymous   
Printed
Page 51

The file I unzipped was called BDB-sql-2005-08-02.sql. You can...

~ % mysql -u jadler -p -s bbdatabank < BDB-sql-2004-12-02.sql
should be:
~ % mysql -u jadler -p -s bbdatabank < BDB-sql-2005-08-02.sql

Anonymous   
Printed
Page 55
Bottom of page 55

"download and install the Microsoft Access database from the Baseball Reference."
Should say Baseball Archive.

Anonymous   
Printed
Page 60
Top of page 60

AVG: [R]/[AB]
TB: [R]+[2B]+2*[3B]+3*[HR]"

Both [R] should be [H].

Anonymous   
Printed
Page 66
middle of the page

below:
"Once you install the table parser, repeat the process for the following modules:"
there are a few modules listed, "HTML-TableContentParser" should be removed

Anonymous   
Printed
Page 66
middle of the page

The text has listed "XML::Simple" as a module that requires separate
installation apart from the TableContentParser (see http://search.cpan.org/~grantm/XML-Simple-2.14/lib/
XML/Simple.pm), but it is worth noting that XML-Simple works with Activestate.

Anonymous   
Printed
Page 69
Line 2

the first line on the @positions perl code needs a ";" at the end.

Anonymous   
Printed
Page 72
Bottom of page 72

"You tell Perl that you want to use an object through the use command."

This should read:
"You tell Perl that you want to use a module through the use command."

Anonymous   
Printed
Page 73
Top of page 73

"The second new returns a new FileHandle object."

This should read:
"The second line returns a new FileHandle object."

Anonymous   
Printed
Page 77
Hack 14 - fetchretro.pl compatibility problem

It should probably be noted that windows users will need to insert the line:
binmode($fh);
after:
my $fh = new FileHandle ">$filename";
in both loops or they will not be able to use these files.

Anonymous   
Printed
Page 81
"-i" option

add a note under the description::
"Please note that case matters when using the BEVENT, BOX, or BGAME
tools. Be sure to capitalize the game identifiers."

Anonymous   
Printed
Page 82
Last Paragraph

The note on this page states "Suprisingly, the World Series was named after The New York World, a now-defunct newspaper."

This is believed to be false. See http://www.snopes.com/business/names/worldseries.asp which references this comment from the baseball hall of fame:

'Others have asked that question of the staff at the Baseball Hall of Fame in Cooperstown, N.Y. in recent weeks. "There's no evidence suggesting it was ever sponsored by the New York World newspaper," said Hall of Fame researcher Eric Enders. When the World Series between the National and American leagues began in 1903, the owners borrowed the name from the world championship series held in the 1880s between the National League and the American Association. Enders concludes the name didn't originate from the name of the long-defunct newspaper. It sounds like an urban myth.'


Thanks for a great book!

Note from the Author or Editor:
I agree with the correction. This is now believed to be false, though you can still find some books that make this claim.

Mike Yacullo  Jan 18, 2009 
Printed
Page 84
Top of page 84, first paragraph of Hack #16

"This hack describes the basics of databases and introduces
Simple Query Language (SQL)"

This should read:
"This hack describes the basics of databases and introduces
Structured Query Language (SQL)"

Anonymous   
Printed
Page 85
Top of page 85

"Notice the semicolon at the end of the command. (Each SQL command
needs to end with a semicolon.)"

This should read:
"Notice the semicolon at the end of the command. (If you use the MySQL
command line interface then, by default, you must end each SQL command
with a semicolon. See the MySQL documentation for more information.)"

Anonymous   
Printed
Page 90
There is a typo in the MySQL code.

This is the complete, correct version:

select name, round(TB/AB, 3) as SLG
from (select (H + 2B + 2 * 3B + 3 * HR) as TB,
AB, yearID, r.franchID as teamID, lgID, name
from teams l inner join teamsFranchises r
on l.idxTeamsFranchises=r.idxTeamsFranchises) t
where t.yearID=2000 and t.lgID="AL";

Anonymous   
Printed
Page 94
Page 94, hack #18

Older versions of Microsoft Access may not support subqueries, although
Microsoft Access 2003 does. See the Microsoft Access help files for details.

Anonymous   
Printed
Page 108
translate.pl script

The translate.pl script for Hack 22 doesn't currently work on Windows.
Here's two ways to fix this problem

Anonymous   
Printed
Page 109
2nd paragraph

For convenience, these scripts and their output are also included on this book's web site.
should be:
For convenience, these scripts are also included on this book's web site.

Anonymous   
Printed
Page 125
Running the hack

To run the hack, you type perl=get_fielding.pl at the command prompt

Surely the equals sign is a mistake

Note from the Author or Editor:
Just checked my copy. This is a typo; there should be a space in place of the "=" sign.

Anonymous  Jan 19, 2009 
Printed
Page 129
Figure 3-9 shows Ethereal, not the proxy sniffer.

Change the preceding paragraph to remove the reference
to the figure, and change the paragraph on Ethereal to note that the
program is shown in Figure 3-9.

Anonymous   
Printed
Page 132
CREATE TABLE fielding

the line "pos CHAR(2)" should be "pos VARCHAR(32)"

Anonymous   
Printed
Page 134
bottom third

The base url for update_db.pl script omits the '/mlb' directory. It should read 'http://gd2.mlb.com/components/game/mlb/year_' instead of 'http://gd2.mlb.com/components/game/year_'.

Note from the Author or Editor:
We should place a general note with the book noting that this book was written in 2006; all the code worked perfectly with the MLB web site in 2006. Unfortunately, the MLB web site is a moving target, and code that worked in 2006, isn't likely to work in 2009. (Nor would I expect code that worked in 2009 to work in 2012.)

Anonymous  Apr 14, 2009 
Printed
Page 140
Informational note (not to be changed in reprint)

Top of the Page;
For the 2006 season, MLB changed the format of the Gameday XML files. The field 'da'
no longer exists and is now 'a'. Accordingly change $batter->{da} to $batter->{a} in
the save_to_db module.

(Also note that if you want to run the bootstrapping script against the 2006 season,
you will need to modify the load_db script to iterate starting from April 3rd, 2006.
IMPORTANT: The first game of the regular season took place on April 2 between the
Indians (CLE) and the White Sox (CHA). This game will have to loaded separately as
preseason games also were played on that date.)

Anonymous   
Printed
Page 142
Hack 28 2nd paragraph

"Most of these web sites include detailed test descriptions..."

The word 'test' should be 'text'.

Anonymous   
Printed
Page 153
2nd paragraph

"Figure 3-11 shows what this file looks like when you open it in the Firefox
browser."

should be "in the Internet Explorer browser."

Anonymous   
Printed
Page 163
Bottom of the page

The command:

install.packages(pkgs=list("Rcmdr")

should be:

install.packages(pkgs="Rcmdr")

Note from the Author or Editor:
I think this is now correct; as of R 2.9.0, the pkgs argument to install.packages should be a character vector and not a list. So, you could also use the command:

install.packages(pkgs=c("Rcmdr"))

or to install multiple packages (for example, Rcmdr and hexbin),

install.packages(pkgs=c("Rcmdr","hexbin"))

Anonymous   
Printed
Page 173
source code at bottom of page

The following line refers to a table that does not exist in the database:

events <- sqlFetch(channel, "events", max=9)

Author's note: The code on the bottom of page 173 shows a database that is not discussed elsewhere
in the book; it's a database that I created using code similar to the code used in hack #28.

As an illustrative example, this one is still valid: it shows how to use ODBC from R. However, readers
won't be able to recreate the results shown.

Anonymous   
Printed
Page 183
2nd paragraph

"In this hack, I examine statistics from 2004"

should be 2003

Anonymous   
Printed
Page 184
middle of page, under "#Plot the charts"

histogram(~ AVG | teamID), nint=10
densityplot(~ AVG | teamID), plot.points=FALSE

should be:
histogram(~ AVG | teamID, nint=10)
densityplot(~ AVG | teamID, plot.points=FALSE)

Anonymous   
Printed
Page 186
Middle of page

"Here are mappings you can use to find teams that are similar offensively:"

The hack actually uses a mix of offensive and defensive statistics. This
should read:
"Here are mappings you can use to find teams that are similar:"

Anonymous   
Printed
Page 193
Last line of text

"directly from the Web and from R..." should be "directly from the Web and into R"

Anonymous   
Printed
Page 198
Perl code

The HTML table needed and which rows to grab in the
Perl script on page 198 of your book should be as follows:

# WE'RE INTERESTED IN THE 2ND HTML TABLE IN THE PAGE
$ts = $te->table_state(0,1);
@rows = $ts->rows;

# HOW MANY HTML TABLE ROWS?
$N = scalar(@rows);

# PRINT OUT THE COLUMN HEADERS
print OUT "TEAM|" . join("|", @{$rows[1]}) . "
";

# FOR REST OF ROWS, PIPE-DELIMIT DATA PLUS A LINEFEED
for $i (2 .. $N-4) {
print OUT "$TeamID|";
print OUT join("|", @{$rows[$i]});
print OUT "
";
}

Anonymous   
Printed
Page 241
second paragraph

There are some typos in the displayed formula for batter runs. It currently reads:

BR=(.46X1B+.85X2B+1.02X3B+1.4HR+.33((BB+HBP)+.22-SB-.35CS-.26(AB-H)))

it should read:

BR=.46X1B+.85X2B+1.02X3B+1.4HR+.33(BB+HBP)+.22SB-.35CS-.26(AB-H)

Anonymous   
Printed
Page 273-275

On page 273, I noted that "we want to use the number of outs played to
calculate range factor, but this information is not available for all
players during all seasons."

The number of outs played is only available for pitchers before the
year 2000. Therefore, all range factor calculations for the other
defensive players are approximations. For the year 2000 and later,
the only missing values are for designated hitters.

So, the analysis presented on page 274 is flawed. I repeated th analysis
shown on this page for the year 2000 and later. Here is the code
to run this analysis:

# create a subset of players who played more than 6 innings,
# and of seasons after 1999:
f_and_t.h54 <- subset(f_and_t, f_and_t$qualify & f_and_t$yearID > 1999)

# calculate range factors:
attach(f_and_t.h54)
f_and_t.h54$RF <- (PO + A) / (InnOuts / 3)
detach(f_and_t.h54)
attach(f_and_t.h54)

# show summary statistics by position:
tapply(RF, INDEX=pos, FUN=summary)
$"1B"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8906 1.0010 1.0410 1.0450 1.0850 1.2090

$"2B"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4339 0.5240 0.5514 0.5512 0.5732 0.6516

$"3B"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2120 0.2780 0.2985 0.2983 0.3191 0.3798

$C
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5551 0.7446 0.7965 0.8051 0.8518 1.0950

$CF
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2182 0.2714 0.2917 0.2933 0.3157 0.3912

$LF
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1154 0.1999 0.2231 0.2225 0.2424 0.2939

$RF
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1579 0.2206 0.2344 0.2349 0.2502 0.2972

$SS
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4129 0.4758 0.4991 0.5022 0.5273 0.6435

Anonymous   
Printed
Page 278
Figure 5-14

The figure shown for this hack is not correct. I plotted the histograms
for all regular players (average of 7 or more innings per game)
from 2000-2004 using the following command:

histogram(~RF|pos,xlab="Range Factor", col="grey", data=subset(f_and_t.h54,
f_and_t.h54$pos != "DH"),nint=30)

The corrected figure is can be found here -
http://examples.oreilly.com/9780596009427/hist.rf.pdf

Anonymous   
Printed
Page 290
Comnent in code at bottom of page

-- no explicit inning count, so we'll use
-- line socres (runs per inning) to
-- determine the number of innings

This method is correct for output from the BGAME.EXE program (see
page 78 for information), but is not correct for the game log
files available from Retrosheet.

The Retrosheet data includes the game length in innings for all games played
in 1962 and later. I actually called this field "LengthInOuts."

The correct code to use here is:

create temporary table scores as
select ceiling(LengthInOuts / 6) as VisitorInnings,
ceiling((LengthInOuts - 3) / 6) as HomeInnings,
HomeRunsScore, VisitorRunsScored,
HomeTeam, VisitingTeam as AwayTeam
from ALLGL
where floor(Date/10000) = 2003;

Based on the corrections, the table shown on page 291 change very slightly.
The following table shows the result of the new calculations:

mysql> select t.teamID as ID, t.park, round(p.pf, 3) as pf
-> from pf p inner join bbdatabank.teams t
-> on p.HomeTeam=t.teamID
-> where t.yearID=2003;

+-----+---------------------------------------+-------+
| ID | park | pf |
+-----+---------------------------------------+-------+
| ANA | Edison International Field | 0.877 |
| ARI | Bank One Ballpark | 1.213 |
| ATL | Turner Field | 0.928 |
| BAL | Oriole Park at Camden Yards | 0.890 |
| BOS | Fenway Park II | 1.094 |
| CHA | U.S. Cellular Field | 0.968 |
| CHN | Wrigley Field | 0.969 |
| CIN | Great American Ball Park | 0.998 |
| CLE | Jacobs Field | 0.856 |
| COL | Coors Field | 1.246 |
| DET | Comerica Park | 0.902 |
| FLO | Pro Player Stadium | 0.869 |
| HOU | Minute Maid Park | 1.070 |
| KCA | Royals Stadium | 1.248 |
| LAN | Dodger Stadium | 0.867 |
| MIL | Miller Park | 1.052 |
| MIN | Hubert H Humphrey Metrodome | 1.034 |
| MON | Stade Olympique,Hiram Bithorn Stadium | 1.383 |
| NYA | Yankee Stadium II | 0.930 |
| NYN | Shea Stadium | 0.967 |
| OAK | Oakland Coliseum | 0.866 |
| PHI | Veterans Stadium | 0.897 |
| PIT | PNC Park | 0.981 |
| SDN | Qualcomm Stadium | 0.829 |
| SEA | Safeco Field | 0.940 |
| SFN | PacBell Park | 0.996 |
| SLN | Busch Stadium II | 0.921 |
| TBA | Tropicana Field | 1.001 |
| TEX | The Ballpark at Arlington | 1.223 |
| TOR | Skydome | 1.103 |
+-----+---------------------------------------+-------+
30 rows in set (0.01 sec)

Anonymous