Building Linux Clusters by David HM Spector Unconfirmed error reports are from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. This page was updated February 8, 2001. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification UNCONFIRMED errors and suggestions from readers: [56] Should read 40MB/S not 10Mbits which off by a factor of 32. [69] Fig 3-6 should show the physical implementation of the hypercube architecture given in the corrected Fig 3-5. However, the internode connections are not point to point in all cases with some 'T' connections indicated, which cannot be possible, for example the connections to Nodes 0, 4 8 and 12 plus the connections to the router. In addition the entire Right Hand Side of the Figure is missing. The textual description in para 2 on page 68 does not match the topology shown in fig 3-6. The text says "...this would require four 16-port hubs..." and Fig 3-6 would show sixteen 4-port hubs, were the Right Hand Side present. {92} 5th Paragraph; The Watt is a timed-factored unit (Joules/sec); therefore is it nonsensical to say the cluster consumes 12kW per hour, and just plain wrong to say it consumes 288kW per day. It, in fact, consumes 288kWh per day (multiplying by hours removes the time factor, so the kWh is then once again a measure of energy, like the Joule). The footnote seems to get this right, but the body of the text doesn't. To clarify, the cluster is consuming up to 12kW at any given moment in time, which equates to 12kWh per hour (kW*h/h seems silly, but there you go), or 288kWh per day. (1 kWh is equal to 3600000J). There is also confusion over time factored units with the BTU on page 91 - a BTU is another unit of energy, not power. Therefore, what you are calculating by multiplying the power consumption in Watts by 3.412 is power consumption in BTU/h. This is correctly stated in the last sentence of page 91 but not in the text leading up to it. <123> Figure 5-6 is the wrong picture. [133] Example 1 has master when it should have cluster. {133} In the fifth paragraph; My slave nodes could not get their information via DHCPD with the supplied hosts file. I needed to add the full domain name, as well. 10.0.2.2 node2 node2.cluster.zeitgeist.com instead of just: 10.0.2.2 node2 {150} Insert space between the 2 directories. [156] Chapter 6, Figure 6-3; Some of scripts listed in hierachy page missing from cd. Could someone put these on website or mail them if available please? [157] figure 6-4; The menu that I get has 'Project / Group Management' instead of 'Group Management' and there is a 'Batch Queue Management' entry on my menu that is not shown in figure 6-4. Also, there is no "Add Cluster" icon on any of the management menus. [158] 1th; In Netscape, I am trying to start the 'NewCluster.cgi' (see page 6.5). After clicking on 'Cluster Management' or 'Group Management' I do get the html-page, but starting one of the cgi's gives an error message: 'The requested URL .... was not found on this server.' Do I have to make additional settings on the httpd-server and what are these ? If not, what could be the problem ? {187} LAM paragraph; The trailing slash is missing at the end of the LAM URL. It should be: http://www.mpi.nd.edu/lam/ not: http://www.mpi.nd.edu/lam True, both URL's will work, but the latter causes an additional hit on our already-beleagured web server, and is not technically correct (because this is a directory URL, not a file URL). [202] First paragraph of LAM section; As I pointed out in a previous eratta item (sorry, I forgot to explicitly mention this one in the previous item), LAM is ***not*** based on MPICH at all. MPI: paper document specifying the library API MPICH: implementation of the MPI API from Argonne National Labs and Mississippi State University LAM/MPI: implementation of the MPI API originally from the Ohio Supercomputing Center, and now developed/maintained by the University of Notre Dame {203} Last paragraph; I understand the point that you're trying to make, but saying "LAM extends MPI by providing a complete execution environment on that sits on top of the MPI libraries..." is not correct. LAM's execution environment is actually *underneath* the its implementation of MPI -- LAM *provides* the execution environment for MPI programs (hence, our formal name "LAM/MPI" when discussed in MPI contexts). LAM is actually a collection of cluster-based tools that includes an MPI layer at the very top. Hence, LAM/MPI's MPI layer draws upon lower-level LAM services to execute. [203] First paragraph; The MPI standard does ***not*** "implement the APIs that define how messages and data are passed..." as is stated in this paragraph. The MPI standard is ***only*** a document specifying the API. It does not implement the API at all. The document was created by the MPI Forum (which was a collection of vendors and academics) such that implementations *could* be (and have been) written by most of the members of the Forum (which is pretty much what happened). That is, many of the members of the MPI Forum went off and wrote their own MPI implementations (or based them upon freeware implementations). Hence, it is incorrect to say that the MPI standard implements *anything* -- it doesn't. Saying that implies that there is one implementation that all others are based off (which seems to be what you are implying when you say that LAM is based on MPICH, for example). And this is simply not true. While there certainly is inbreeding between the various MPI implementations that exist, all the mainstream MPI implementations are currently (for the most part) unrelated to each other. As a sidenote (and I alluded to this in a previous eratta item), many vendor implementations of MPI were originally based upon the MPICH implementation (hence, MPICH was sometimes called "the reference implementation"). However, all vendors who have done this have now moved away from the MPICH model and essentially re-implemented their MPI from scratch. This allowed them to get much higher performance than the MPICH allowed for because the original MPICH models were aimed at portability, not necessarily performance. Hence, it is probably safe to say that all mainstream MPI implementations that exist, while they may be somewhat related and share small portions of code, are essentially independant implementations. {204} First 2 full paragraphs; We use the term "boot schema" instead of "nodes file", although the way you describe it is essnetially correct. I understand you wanting to make the term as simple as possible for your readers, but they may be confused if they read your book and then look at our documentation. Addtionally, an application schema file is almost always *not* necessary to run a parallel program under LAM/MPI. An application schema file is only necessary if you need to specify different executables or options to different ranks in the parallel job. For example, if you wanted to launch a "master" process and several "slave" processes, you would use an application schema. The far more common case -- launching a single executable on all nodes in the parallel job -- is much simpler; there is no need for an application schema. Indeed, it is a single command: mpirun -np 8 myprogram where "myprogram" will be launched on 8 nodes.