Errata

Building Linux Clusters

Errata for Building Linux Clusters

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed Page 56

Should read 40MB/S not 10Mbits which off by a factor of 32.

Anonymous   
Printed Page 69

Fig 3-6 should show the physical implementation of the hypercube architecture given in the corrected Fig 3-5. However, the internode
connections are not point to point in all cases with some 'T' connections
indicated, which cannot be possible, for example the connections to Nodes
0, 4 8 and 12 plus the connections to the router. In addition the entire
Right Hand Side of the Figure is missing. The textual description in para 2
on page 68 does not match the topology shown in fig 3-6. The text says
"...this would require four 16-port hubs..." and Fig 3-6 would show sixteen
4-port hubs, were the Right Hand Side present.

Anonymous   
Printed Page 92
5th Paragraph

The Watt is a timed-factored unit (Joules/sec); therefore is it nonsensical
to say the cluster consumes 12kW per hour, and just plain wrong to say it
consumes 288kW per day. It, in fact, consumes 288kWh per day (multiplying
by hours removes the time factor, so the kWh is then once again a measure
of energy, like the Joule). The footnote seems to get this right, but the
body of the text doesn't.

To clarify, the cluster is consuming up to 12kW at any given moment in
time, which equates to 12kWh per hour (kW*h/h seems silly, but there you
go), or 288kWh per day. (1 kWh is equal to 3600000J).

There is also confusion over time factored units with the BTU on page 91 -
a BTU is another unit of energy, not power. Therefore, what you are
calculating by multiplying the power consumption in Watts by 3.412 is power
consumption in BTU/h. This is correctly stated in the last sentence of
page 91 but not in the text leading up to it.

Anonymous   
Printed Page 123
Fig 5-6

Figure 5-6 is the wrong picture.

Anonymous   
Printed Page 133
Ex. 1

Example 1 has master when it should have cluster.

Anonymous   
Printed Page 133
In the fifth paragraph

My slave nodes could not get their information via DHCPD with the supplied
hosts file. I needed to add the full domain name, as well.

10.0.2.2 node2 node2.cluster.zeitgeist.com

instead of just:

10.0.2.2 node2

Anonymous   
Printed Page 150

Insert space between the 2 directories.

Anonymous   
Printed Page 156
Chapter 6, Figure 6-3

Some of scripts listed in hierachy page missing from cd. Could someone put
these on website or mail them if available please?

Anonymous   
Printed Page 157
figure 6-4

The menu that I get has 'Project / Group Management' instead of 'Group
Management' and there is a 'Batch Queue Management' entry on my menu that
is not shown in figure 6-4. Also, there is no "Add Cluster" icon on any of
the management menus.

Anonymous   
Printed Page 158
1th

In Netscape, I am trying to start the 'NewCluster.cgi' (see page 6.5).
After clicking on 'Cluster Management' or 'Group Management' I do get the
html-page, but starting one of the cgi's gives an error message:
'The requested URL .... was not found on this server.'

Do I have to make additional settings on the httpd-server and what are these ?
If not, what could be the problem ?

Anonymous   
Printed Page 187
LAM paragraph

The trailing slash is missing at the end of the LAM URL. It should be:

http://www.mpi.nd.edu/lam/

not:

http://www.mpi.nd.edu/lam

True, both URL's will work, but the latter causes an additional hit on our
already-beleagured web server, and is not technically correct (because this
is a directory URL, not a file URL).

Anonymous   
Printed Page 202
First paragraph of LAM section

As I pointed out in a previous eratta item (sorry, I forgot to explicitly
mention this one in the previous item), LAM is ***not*** based on MPICH at all.

MPI: paper document specifying the library API

MPICH: implementation of the MPI API from Argonne National Labs and
Mississippi State University

LAM/MPI: implementation of the MPI API originally from the Ohio
Supercomputing Center, and now developed/maintained by the University of
Notre Dame

Anonymous   
Printed Page 203
Last paragraph

I understand the point that you're trying to make, but saying "LAM extends
MPI by providing a complete execution environment on that sits on top of
the MPI libraries..." is not correct.

LAM's execution environment is actually *underneath* the its implementation
of MPI -- LAM *provides* the execution environment for MPI programs (hence,
our formal name "LAM/MPI" when discussed in MPI contexts). LAM is actually
a collection of cluster-based tools that includes an MPI layer at the very
top. Hence, LAM/MPI's MPI layer draws upon lower-level LAM services to
execute.

Anonymous   
Printed Page 203
First paragraph

The MPI standard does ***not*** "implement the APIs that define how
messages and data are passed..." as is stated in this paragraph.

The MPI standard is ***only*** a document specifying the API. It does not
implement the API at all. The document was created by the MPI Forum (which
was a collection of vendors and academics) such that implementations
*could* be (and have been) written by most of the members of the Forum
(which is pretty much what happened). That is, many of the members of the
MPI Forum went off and wrote their own MPI implementations (or based them
upon freeware implementations).

Hence, it is incorrect to say that the MPI standard implements *anything*
-- it doesn't. Saying that implies that there is one implementation that
all others are based off (which seems to be what you are implying when you
say that LAM is based on MPICH, for example). And this is simply not
true. While there certainly is inbreeding between the various MPI
implementations that exist, all the mainstream MPI implementations are
currently (for the most part) unrelated to each other.

As a sidenote (and I alluded to this in a previous eratta item), many
vendor implementations of MPI were originally based upon the MPICH
implementation (hence, MPICH was sometimes called "the reference
implementation"). However, all vendors who have done this have now moved
away from the MPICH model and essentially re-implemented their MPI from
scratch. This allowed them to get much higher performance than the MPICH
allowed for because the original MPICH models were aimed at portability,
not necessarily performance. Hence, it is probably safe to say that all
mainstream MPI implementations that exist, while they may be somewhat
related and share small portions of code, are essentially independant
implementations.

Anonymous   
Printed Page 204
First 2 full paragraphs

We use the term "boot schema" instead of "nodes file", although the way you
describe it is essnetially correct. I understand you wanting to make the
term as simple as possible for your readers, but they may be confused if
they read your book and then look at our documentation.

Addtionally, an application schema file is almost always *not* necessary to
run a parallel program under LAM/MPI. An application schema file is only
necessary if you need to specify different executables or options to
different ranks in the parallel job. For example, if you wanted to launch
a "master" process and several "slave" processes, you would use an
application schema.

The far more common case -- launching a single executable on all nodes in
the parallel job -- is much simpler; there is no need for an application
schema. Indeed, it is a single command:

mpirun -np 8 myprogram

where "myprogram" will be launched on 8 nodes.

Anonymous