Errata

Errata for Building Linux Clusters

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by
Printed	Page 56	Should read 40MB/S not 10Mbits which off by a factor of 32.	Anonymous
Printed	Page 69	Fig 3-6 should show the physical implementation of the hypercube architecture given in the corrected Fig 3-5. However, the internode connections are not point to point in all cases with some 'T' connections indicated, which cannot be possible, for example the connections to Nodes 0, 4 8 and 12 plus the connections to the router. In addition the entire Right Hand Side of the Figure is missing. The textual description in para 2 on page 68 does not match the topology shown in fig 3-6. The text says "...this would require four 16-port hubs..." and Fig 3-6 would show sixteen 4-port hubs, were the Right Hand Side present.	Anonymous
Printed	Page 92 5th Paragraph	The Watt is a timed-factored unit (Joules/sec); therefore is it nonsensical to say the cluster consumes 12kW per hour, and just plain wrong to say it consumes 288kW per day. It, in fact, consumes 288kWh per day (multiplying by hours removes the time factor, so the kWh is then once again a measure of energy, like the Joule). The footnote seems to get this right, but the body of the text doesn't. To clarify, the cluster is consuming up to 12kW at any given moment in time, which equates to 12kWh per hour (kW*h/h seems silly, but there you go), or 288kWh per day. (1 kWh is equal to 3600000J). There is also confusion over time factored units with the BTU on page 91 - a BTU is another unit of energy, not power. Therefore, what you are calculating by multiplying the power consumption in Watts by 3.412 is power consumption in BTU/h. This is correctly stated in the last sentence of page 91 but not in the text leading up to it.	Anonymous
Printed	Page 123 Fig 5-6	Figure 5-6 is the wrong picture.	Anonymous
Printed	Page 133 Ex. 1	Example 1 has master when it should have cluster.	Anonymous
Printed	Page 133 In the fifth paragraph	My slave nodes could not get their information via DHCPD with the supplied hosts file. I needed to add the full domain name, as well. 10.0.2.2 node2 node2.cluster.zeitgeist.com instead of just: 10.0.2.2 node2	Anonymous
Printed	Page 150	Insert space between the 2 directories.	Anonymous
Printed	Page 156 Chapter 6, Figure 6-3	Some of scripts listed in hierachy page missing from cd. Could someone put these on website or mail them if available please?	Anonymous
Printed	Page 157 figure 6-4	The menu that I get has 'Project / Group Management' instead of 'Group Management' and there is a 'Batch Queue Management' entry on my menu that is not shown in figure 6-4. Also, there is no "Add Cluster" icon on any of the management menus.	Anonymous
Printed	Page 158 1th	In Netscape, I am trying to start the 'NewCluster.cgi' (see page 6.5). After clicking on 'Cluster Management' or 'Group Management' I do get the html-page, but starting one of the cgi's gives an error message: 'The requested URL .... was not found on this server.' Do I have to make additional settings on the httpd-server and what are these ? If not, what could be the problem ?	Anonymous
Printed	Page 187 LAM paragraph	The trailing slash is missing at the end of the LAM URL. It should be: http://www.mpi.nd.edu/lam/ not: http://www.mpi.nd.edu/lam True, both URL's will work, but the latter causes an additional hit on our already-beleagured web server, and is not technically correct (because this is a directory URL, not a file URL).	Anonymous
Printed	Page 202 First paragraph of LAM section	As I pointed out in a previous eratta item (sorry, I forgot to explicitly mention this one in the previous item), LAM is *not* based on MPICH at all. MPI: paper document specifying the library API MPICH: implementation of the MPI API from Argonne National Labs and Mississippi State University LAM/MPI: implementation of the MPI API originally from the Ohio Supercomputing Center, and now developed/maintained by the University of Notre Dame	Anonymous
Printed	Page 203 Last paragraph	I understand the point that you're trying to make, but saying "LAM extends MPI by providing a complete execution environment on that sits on top of the MPI libraries..." is not correct. LAM's execution environment is actually underneath the its implementation of MPI -- LAM provides the execution environment for MPI programs (hence, our formal name "LAM/MPI" when discussed in MPI contexts). LAM is actually a collection of cluster-based tools that includes an MPI layer at the very top. Hence, LAM/MPI's MPI layer draws upon lower-level LAM services to execute.	Anonymous
Printed	Page 203 First paragraph	The MPI standard does *not* "implement the APIs that define how messages and data are passed..." as is stated in this paragraph. The MPI standard is *only* a document specifying the API. It does not implement the API at all. The document was created by the MPI Forum (which was a collection of vendors and academics) such that implementations could be (and have been) written by most of the members of the Forum (which is pretty much what happened). That is, many of the members of the MPI Forum went off and wrote their own MPI implementations (or based them upon freeware implementations). Hence, it is incorrect to say that the MPI standard implements anything -- it doesn't. Saying that implies that there is one implementation that all others are based off (which seems to be what you are implying when you say that LAM is based on MPICH, for example). And this is simply not true. While there certainly is inbreeding between the various MPI implementations that exist, all the mainstream MPI implementations are currently (for the most part) unrelated to each other. As a sidenote (and I alluded to this in a previous eratta item), many vendor implementations of MPI were originally based upon the MPICH implementation (hence, MPICH was sometimes called "the reference implementation"). However, all vendors who have done this have now moved away from the MPICH model and essentially re-implemented their MPI from scratch. This allowed them to get much higher performance than the MPICH allowed for because the original MPICH models were aimed at portability, not necessarily performance. Hence, it is probably safe to say that all mainstream MPI implementations that exist, while they may be somewhat related and share small portions of code, are essentially independant implementations.	Anonymous
Printed	Page 204 First 2 full paragraphs	We use the term "boot schema" instead of "nodes file", although the way you describe it is essnetially correct. I understand you wanting to make the term as simple as possible for your readers, but they may be confused if they read your book and then look at our documentation. Addtionally, an application schema file is almost always not necessary to run a parallel program under LAM/MPI. An application schema file is only necessary if you need to specify different executables or options to different ranks in the parallel job. For example, if you wanted to launch a "master" process and several "slave" processes, you would use an application schema. The far more common case -- launching a single executable on all nodes in the parallel job -- is much simpler; there is no need for an application schema. Indeed, it is a single command: mpirun -np 8 myprogram where "myprogram" will be launched on 8 nodes.	Anonymous