Understanding the Linux Kernel By Daniel P. Bovet and Marco Cesati Unconfirmed error reports are from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. This page was updated October 29, 2002. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification UNCONFIRMED errors and suggestions from readers: [15], Second to last line, I think ID should be GID: {17} Bullet 4, line 2.; "each file" should be "each process". (17) 6th paragraph; "each file" -> "each process"? {20} First paragraph, last sentence; Should not "As a consequence of the interrupt, Process 2 SWITCHES TO KERNEL MODE AND services the interrupt." read "As a consequence of the interrupt, Process 2 IS SUSPENDED WHILE THE CPU SWITCHES TO KERNEL MODE AND THE KERNEL (?:PROCESS (?:MANAGER)?)? services the interrupt." ? Not sure. {25} Semaphores section; The up() method increments the value of the semaphore and, if its value is greater than or equal to 0, reactivates one or more processes in the semaphore list. I think "greater" should be "less" (26) at the bottom: It says Ctrl-C sends a SIGTERM to a process, when in fact it sends a SIGINT [26] 3rd paragraph; "it starts an endless loop" Not if interrupts are enabled and preemption is allowed. As explained on p. 280, Linux is non-preemptive in the kernel, so this is true in the Linux kernel environment, but due to special considerations far in the future in the book. In fact spin-locks have been used historically in uniprocessor preemptive systems. {47} paragraph 2; Line 4: "Page Directory" should be "Page Table". Line 5: "Page Table" should be "Page". [55] 1st paragraph; Dear Sirs, I think there might be an error in the sentence :"The kernel keeps a position for the Page Middle Directory by setting the number of entries in it to 1 and (?mapping the single into the proper entry of the Page Global Directory?). The part of the sentence which I put into parenthesises might be wrong. In my opinion it should be exchanged by "pointing with the single into the proper Page Table" , because an entry in the Middle Directory points to the Page Table and not to the Page Global Directory. I think it works this way for IA-32 processors i.e. an entry in the Page Global Directory points to the Page Middle Directory with only one entry in it. This entry in the Middle Direcory contains then the address of the proper Page Table. It means that the descriptors which would be normally in the Page Directory for the two-level paging were shifted to the Page Middle Directory for the three-level paging purpose and the descriptors in Page Global Directory point therefore to the Middle Directory. Thanks for your response or an explanation if I am wrong! {57} 3rd paragraph; "...values 4, 512, and 512, respectively..." should be: "...values 512, 512, and 4, respectively" This error is in contradiction to the sequence described in the first line of the same paragraph. {64} 1rd paragraph; The original codes are: address = 0; pg_dir = swapper_pg_dir; ... Should be corrected to: address = PAGE_OFFSET; pg_dir = swapper_pg_dir; ... (66) Figure 3-1; replace: "p_optr" with: "p_opptr" {69} 3rd paragraph; It appears the value of 0x015fc000 for the esp should either be changed to reflect the diagram or the esp in Figure 3-2 should be pointing to the top of the stack. {69} Figure 3.2; The process descriptor should be in kernel memory, but the addresses used in the example are clearly below PAGE_OFFSET, i.e., are user addresses. [82] Paragraph 7 (1.); `If it is set to 3 (...), it performs the next check; otherwise, it raises a "General protection error" exception.' should be changed to `If it is set to 3 (...), it grants access; otherwise, it performs the next check.' Source: Intel(R) Architecture Software Developer's Manual, Volume 1: Basic Architecture, Order number 245470, Intel, 2001, pp. 12-5: `If in protected mode and the CPL is less than or equal to the current IOPL, the processor allows all I/O operations to proceed.' [85] Point e of step 7; Why 920 is added to the value of ebx and ecx (registers contain pointers to next and prev processes respectively) [85] Point 7e; In the explanation to assembler code follwong is written: "In practice, the check is made by referring to the tss.segments field (at offset 112 in the process descriptor)..." Well, everything in quoted sentence is false. There is no such thing as tss.segments and it is not situated at offset 112 of process desc riptor. What we DO have is following: we have pointer to mm_struct which is situated at offset 920 in process descriptor (it is the reason for movl 920(%ebx), %edx movl 920(%ecx), %eax part of code). In this structure at offset 112 one can find segments but not in tss. Now what "in practice..." concerns, as one can read in the code (mm_struct declaration) this practice is applicable only to Intel x86 machines. So I think that more appropriate wording is: "On Intel machines..." {90} entry CLONE_FS; replace: "The table that identifies the root directory and the current working directory." with: "The table that identifies the root directory and the current working directory, as well as the value of the bit masks used to set the initial file permissions of a new file." {91} paragraphs 5, 6 and 8; Paragraph 5, line 1: "first" should be "third". Paragraph 6, line 2: "first" should be "third". Paragraph 8, line 2: "null" should be "0". {97} Numbered paragraph 3; Line 3: "files(), __exit_fs(()" should be "fs(), __exit_files()". {104} 6th paragraph, point 2b.; "b. Stores the vector in an Interrupt COntroller I/O port, thus allowing the CPU to read it via the data bus." should be deleted--there is no such I/O port After step c., add-- "Communicates the vector to the CPU in a special bus cycle, after the CPU asserts INTA#, the interrupt acknowledge signal" (118) 3rd paragraph; "page frames to the process until the last possible fmoment." {122} last paragraph (under 'depth'); The last line says " ... enable_irq() function decrements the field; if depth becomes 0, the function enables the IRQ line." Actually, if the depth becomes 1 (switch-case 1), the IRQ line is enabled by calling the desc->handler->enable() funtion. _Then_ the depth is decremented to 0. So ideally, if the depth becomes 0 (switch-case 0), then the line should not be enabled again, but just print the "unbalanced" error. I think changing the line in the text to " ... enable_irq() function decrements the field; if depth becomes 1, the function enables the IRQ line." would avoid all ammbiguity. (145) 4th paragraph; reads "The function perform the following actions:" should be "The function performs the following actions:" ? [151] 2nd paragraph; Second sentence has it : "The index field specifies the currently scanned ; it is incremented by 1 (modulo 64) every 256^(i-1) ticks.....". Statement after semicolon is wrong. It is incremented as following: in tv1 every unit of time (i.e every tick) in tv2 every 256 units of time (i.e. every 256 ticks) in tv3 every 16384 (i.e. 2^14) units of time and 16384 is NOT 256^2 as it should be due to the given formula. For the sake of completness, in tv4 it is incremented every 1048576 (i.e. 2^20) units of time and it is not 256^3 and in tv5 it is incremented every 67108864 (i.e. every 2^26) units of time and it is not 256^4. (158) 2nd paragraph: spelling error: "possfsible" should be: possible [160] Table 6-1; The title is "Flags ..." and the heading of the first column is "Flag Name", but the code at the top of p. 161 makes it clear that these are in fact shifts used to generate the flags, i.e. lg(flag). {164} 2rd paragraph(discription about the function free_pages(addr,order)); original: This function check the page discriptor of the page frame having the physical address addr. ~~~~~~~~ suggest: This function check the page discriptor of the page frame having the linear address addr. ~~~~~~ reason: firstly the corresponding function get_free_pages returns a linear address of the first new allocated block.This can be concluded from this book page 169: "return PAGE_OFFSET+(map_nr<> PAGE_SHIFT)" and __pa(): "#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)" [170] 2nd paragraph; Text has it: "The mask variable contains the two's complement of 2**order" It is not true. Let us look on the code in the 1st paragraph. unsigned long mask = (~0UL) << order. What one gets here is 32-order "1" and "order" bits set to "0". It is not two's compliment. Further in paragraph one can read :"... and to increment nr_free_pages." The phrase is then illustrated by nr_free_pages -= mask; It looks like decrementing but is incrementing indeed. It is achieved by deliberate use of overflow, though. So I would restate the quoted phrase something like : "... and to increment nr_free_pages by means of overflow" or something (171) line 22 : 131056 is not correct, and should be 131072 (174) at line 24 : kem_slab_t --> should be kmem_slab_t {180} Figure 6-5. Relationships between slab and object descriptors; I think that in the lower part of the figure (Slab with External Object Descriptors), the second and third object descriptors should be exchanged; the u field of the second descriptor should point to the fourth object in the cache (next free object), while the u field of the third descriptor should point to the third object (the allocated object). {180} 1st paragraphs; First, the c_index_cachep field of the cache containing the slab points to the cache descriptor of the cache containing the object descriptors. ===> First, the c_index_cachep field of the cache containing the objects points to the cache descriptor of the cache containing their object descriptors. (201) Last paragraph.; Line 2: "(of 2)" should be "(base 2)". {211} 7th paragraph; The line: if (addr & 0xfffff000) should be: if (addr & ~PAGE_MASK) or: if (addr & ~0xfffff000) (212) last line; Delete last line. (213) Third displayed code; Looks like line 2 should be current->mm->map_count >= MAX_MAP_COUNT) (215) Bullets 2 and 3; Bullet 2, line 1: "lower" should be "upper". Bullet 3, line 1: "upper" should be "lower". (219) Paragraph 6; Line 3: "should be only" should be "could be". [222] Bullet 2; This should only happen for write access. {231} 7th paragraph; The sentence that begins Fans of awk . . . should begin Fans of sed . . . {241} 1st paragraph; PAGE_OFFSET-1 for normal processes and the value 0xffffffff Should be corrected to: PAGE_OFFSET-1 for normal processes and the value 0xbfffffff [254] Paragraph 2 of Data Structures Associated with Signals; Line 4: "Signal 1 is mapped to bit 1" should be "Signal 1 is mapped to bit 0". [256] sigmask(nsig); Line 1: "index" should be "mask". {257} Paragraph 2 of dequeue_signal(mask, info); Line 1: "blocked" should be "nonblocked". {259} Bullet 1, line 2; "execute" should be "continue or execute". (279) second paragraph; replace: "the scheduling parameters" with: "the scheduling priorities" {289} Paragraph 2, line 6; "CPU 2" should be "CPU 1" {302} Table 11-1. Atomic Operations in C; The description for atomic_dec_and_test(v) states: "Subtract 1 from *v and return 1 if the result is non-null, 0 otherwise." After taking a look at the actual routine, it seems to me that the description should read: "Subtract 1 from *v and return 1 if the result is null, 0 otherwise." The kernel I am referencing is version 2.4.18. The header comment states: /* * ... * Atomically decrements @v by 1 and * returns true if the result is 0, or false for all other * cases. * ... */ static __inline__ int atomic_dec_and_test(atomic_t *v) { unsigned char c; __asm__ __volatile__( LOCK "decl %0; sete %1" :"=m" (v->counter), "=qm" (c) :"m" (v->counter) : "memory"); return c != 0; } I have also looked at kernel versions: 2.0.1 and 2.2.0, and the both implement atomic_dec_and_test() the same as in 2.4.18 (besides the use of __atomic_fool_gcc(v) which isn't used in 2.4.18). I could also be mis-reading the (books) description, and I apologize if I am incorrect (of if this has been caught already). {320} Table 11-4; The Description of write_unlock_irq(rwlp) should be write_unlock(rwlp); __sti() (332) Figure 12-2; In the figure, there are two "Process 2"s. I think the second one really was meant to be Process 3, and the text would make some more sense that way. (368) 2nd paragraph; All lock_file structures ... should be: All file_lock structures ... [390] 6th paragraph; The code below io_mem = ioremap(0xfb000000, 0x200000); should be io_mem = ioremap(0xfbf00000, 0x200000); , so the following code t2 = *((unsigned char *)(io_mem + 0x100000)); can reads the memory location having the 0xfc000000 address. (467) 2.; The last sentence is incomplete. {468} Last paragraph, line 2; Should start: page slots to be allocated. This field is reset to SWAPFILE_CLUSTER when ... (490) item d; delete "on the next page" [490] 2nd bulleted item in item c; replace "the page slot usage" with "the page usage" {498} figure 17-1 on the top of the page; The diagram of an ext2 block group says the group descriptors occupy n block (not plural). But from the description of the group d escriptor on page 501, it seems like there should only be one group descriptor per block group. {503} 2nd paragraph; 4th line onwards where they discuss an example regarding inode numbers Given values: 4096 inodes per block group, inode number 13021. The sixth line reads in part: "In this case, the inode belongs to the third block group..." This inode belongs in fourth block group not the third. [531] last line; at the last line of p.531: "bytes must be automatically executed." must be changed to: "bytes must be atomically executed" (602) entry copy_files; replace: "kernel/fork.c,"  with: "kernel/fork.c" (602) entry copy_fs; replace: "kernel/fork.c,"  with: "kernel/fork.c" (602) entry copy_sighand; replace: "kernel/fork.c, and"  with: "kernel/fork.c" (606) entry "free_page_tables"; delete "function" {625} insert new entry; add after "wait_on_buffer": "wait_queue include/linux/wait.h 76"