Linux Device Drivers, 3rd Edition by Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman The unconfirmed error reports are from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification This page was updated April 10, 2008. UNCONFIRMED errors and comments from readers: {15} last para; "... building modules for 2.6.x requires that you have a configured and built kernel tree on your system." Technically, no. All you need (at least for 2.6.18 that I'm playing with) is a tree that has been configured and had "make modules_prepare" run against it. {15} near bottom; "... building modules for 2.6.x requires that you have a configured and built kernel tree on your system." No, only a tree that's been "make modules_prepare"d. the same inaccurate claim is made 2/3 of the way down p. 17. {16} code example at the bottom half of the page. Only one example in that page.; I am not a kernel developer. Not sure about the following. Still, 1) For completness. 2) In view of the `Initiallization and Shutdown' section on pages 31 and 32. 3) In order to help newbies. 4) It seems to me that recent 2.6 kernels emit warnings about that. I would modify: static int hello_init(void) to: static int __init hello_init(void) and: static void hello_exit(void) to: static void __exit hello_exit(void) In addition, you should add to the explanation text that, some more aspects of the example code are elaborated in the `Initiallization and Shutdown' section on pages 31 and 32. (16) code segment; given that you subsequently recommend putting the MODULE_* macros at the bottom of the module source, you might want to be consistent and move that MODULE_LICENSE invocation down as well. [16] 4th; running gcc -c hello.c (or alternately make from the sample code) absolutely does not work. The author talks about building a kernel tree, but doesn't give any samples as to how to do this (I believe I have a kernel tree already, but I figured I'd point this out). I am using both suse 9.2 and 9.3, if this helps out anything. The errors obtained during make are: /usr/include/linux/jiffies.h:16: error: parse error before jiffies_64 and the errors continue for about 400 lines. Any clarifications by the author would be most welcomed. {17} 2nd last para; As on p.15, building modules doesn't need a configured and built kernel tree, just a configured and "make modules_prepare"d tree at least for the latest versions of the kernel. (Perhaps this wasn't true at the time of printing.) {26} halfway down; there is no vermagic.o anymore -- instead, there's a "vermagic.h" header file these days. (26) 3rd para; technically, not an error at all, but it might be worth noting that vermagic.o has been replaced by include/linux/vermagic.h in newer kernsls, just in case readers with a newer kernel go looking for that file and can't find it. {26} last line; "version.h" is automatically included by "module.h" ... not these days, it isn't. {29} Fig 2-2; the set of dependencies in Fig 2-2 doesn't appear to match the actual dependencies (at least in the 2.6.18 kernel.) the dependencies as listed by "lsmod" for the 2.6.18 kernel is, in part: ... lp 12872 0 parport_pc 26788 0 parport 36424 2 lp,parport_pc ... that would seem to differ from what the dependency arrows show in that figure, but it's possible it was different back in 2.6.10. {29} Figure 2.2; I'm not sure that picture is technically accurate anymore, but I could be wrong. {30} 1/3 way down; If you already include module.h, then there is no need to additionally include moduleparam.h since (at least at the moment) module.h includes that heasder file directly already. Personally, I think that's a bad idea, and I've even written a wiki page on it: http://fsdev.net/wiki/index.php?title=Moduleparam.h (32) middle of page; "If your module does not define a cleanup function, the kernel does not allow it to be unloaded." It's not clear if this is still true if you've configured your kernel for forced unloading and use "rmmod -f". Perhaps that could be clarified. {34} 4th line from bottom; "item2" -> "item1", no? {40} 2/3 way down page; another incorrect reference to vermagic.o, should refer to vermagic.h. (41) Third line; It should read: "The FIRST form exports (...), and the SECOND limits the export (...)". The construct FORMER/LATER is also valid. (41) MODULE_* list; Is there any value in adding MODULE_FIRMWARE or the more generic MODULE_INFO macros to that list? {45} 5 lines from bottom; "request_chrdev_region" -> "register_chrdev_region" (50) In aio_write() function; loff_t it's not a pointer for this function. Where it is "ssize_t (*aio_write) (struct kiocb *, const char __user *, size_t, loff_t *)" you should read "ssize_t (*aio_write) (struct kiocb *, const char __user *, size_t, loff_t)" [57] Source code block in the 1st paragraph; It seems the line dev->cdev.ops = &scull_fops; is redundant because this assignment is done in the previous line cdev_init(&dev->cdev, &scull_fops); {57} line 5; int err, devno = MKDEV(scull_major, scull_minor + index); should be: int err; dev_t devno = MKDEV(scull_major, scull_minor + index); {59} last paragraph; "Shut down the device on last close" is misleading, because it supposes release function is called on every file close. Should read: "Shut down the device" {62} first para; "writing a single byte in scull consumes 8,000 or 12,000 bytes of memory..." That total counts the size of the first quantum pointer array and the first quantum, but shouldn't you also add in the size of the first scull_qset structure you have to allocate? Sure, it's tiny, but it should still be counted, no? {68} bottom of page 68 and top of page 69; To be technically correct, sizeof(char *) should say sizeof(void *) since dptr->data is of type void ** (i.e. pointer to void *). {74} 1/2 way down; CONFIG_INIT_DEBUG does not seem to exist anymore. {78} Code example and 2nd paragraph; The code example on this page returns "Invalid argument" # ./setconsole 1 ./setconsole: ioctl(stdin, TIOCLINUX): Invalid argument Code correction: 11 is the TIOCL_SETKMSGREDIRECT cmd number include/linux/tiocl.h:#define TIOCL_SETKMSGREDIRECT 11 /* restrict kernel messages to a vt */ Also, drivers/char/vt.c explains TIOCLINUX, not tty_io.c I'm still trying to figure out why it always returns EINVAL. My kernel version is: # uname -r 2.6.17-1.2174_FC5 (82) last line; The filename should probably be /etc/syslog.conf, not /etc/syslogd.conf. (84) second last para; You refer more than once to "proc_read" when you probably meant to say "read_proc", just to be consistent. (103) 3rd code snippet; There's already an errata to change "mm cf26ac0c" to "mm cf36ac0c", but the output should also be changed from "0xcf26ac0c = 0x50" to "0xcf36ac0c = 0x50". (110) bottom of page; I'm guessing you can replace the semaphore-based mutexes with real mutexes based on mutex.h, at least in some places. {128} read_seqretry() example; The example shows the read_seqretry() as: } while read_seqretry(&the_lock, seq); There must be parenthesis around the condition of 'while' in C. So the line must be changed to: } while (read_seqretry(&the_lock, seq)); {141} 5th paragraph; FIOQSIZE ioctl returns the size of symbolic link (S_IFLNK) files as well as regular and directory files. I think it would be better if the text said: "... when applied to any other file type such as device files, however, it ..." [149] last para; The descriptions of the return values of the 4 wait_event... macros are misleading, if not downright wrong. The last sentence of the paragraph says that wait_event_timeout and wait_event_interruptible_timeout return 0 on a timeout, but does not describe the other possible return values: >0 if the condition became true, no timeout or signal (value is jiffies remaining) <0 if a signal was received by the process (wait_event_interruptible_timeout only) For clarity, the return value of wait_event_interruptible should be described as: =0 if the condition became true, no signal received by the process <0 if a signal was received by the process This is compatible with wait_event_interrupible_timeout. Also, the paragraph should state the wait_event does not have any return value (these are macros, not functions). The user must not try to store / check a return value for wait_event. Note that the descriptions of wait_event and wait_event_interruptible in the 2nd edition, P. 142, were much better. The wake_up... functions are still properly described on P. 161 of the 3rd edition, but the good descriptions of the wait_event... macros disappeared between editions. The 3rd edition seems to lack that type of clear detailed description of macros and functions. {159} 2nd paragraph; This paragraph explains why there is no problem if the wakeup happens immediately before the call to schedule(). I think it should also explain why the following scenario is not a race condition: a) Wakeup occurs immediately before the call to prepare_to_wait() b) Call to prepare_to_wait() sets process state to TASK_INTERRUPTIBLE c) While the spacefree() function is running, the process is pre-empted by an unrelated process. d) The process is now still marked as TASK_INTERRUPTIBLE and will therefore not be re-scheduled and will never execute the call to finish_wait() - so it will sleep unless/until an interruption comes along... {160} 2nd paragraph from bottom; "Note that there is no way to perform exclusive waits with wait_event and its variants". However, in linux-2.6.10/include/linux/wait.h the following is defined: #define wait_event_interruptible_exclusive(wq, condition) This disagrees with the above statement. (169) first paragraph, line 7; "... data structure exactly once ..." [emphasis on precision] could be better as "... data structure just once ..." [emphasis on small value] [184] 2nd paragraph; In the beginning of the second paragraph it is stated that the jiffies counter is initialized to 0. However, I'm working with version 2.6.15 of the kernel, and the jiffies counter is initialized to -5 minutes. It is stated in jiffies.h that this is done to help detect errors when jiffies rolls over. I encountered this when I was porting a driver from 2.4.22 to 2.6.15. (186) last lines; After describing how the TSC can be read from both kernel and user space, the text refers to , which seems to be available only in kernel space. And there's no mention of , which might very well be a post-book development, I don't know. {186} 5; "...as long as that difference doesnt exceed the overflow time, you can get the work done without claiming exclusive..." The difference would never exceed the overflow time. Assume the register is one byte and wrap around 256. The difference of any two values of unsigned char will not exceed 256, which is the value of the "overflow time". Therefore, It Should be "as long as that difference doesn't exceed half of the overflow time". {200} Last paragraph; The timer API description for mod_timer and del_timer_sync show an incorrect return type of int. The Quick Reference list of same functions on page 270 at end of chapter shows the correct return value of void for these functions. [206] 2nd paragraph; The text states "The return value from these values is 0 if the work was successfully added to the queue; a nonzero result means that this work_struct structure was already waiting...". The kernel code states the following which indicates that non-zero is success: /* * Queue work on a workqueue. Return non-zero if it was successfully * added. * * We queue the work to the CPU it was submitted, but there is no * guarantee that it will be processed by that CPU. */ int fastcall queue_work(struct workqueue_struct *wq, struct work_struct *work) { int ret = 0, cpu = get_cpu(); if (!test_and_set_bit(0, &work->pending)) { if (unlikely(is_single_threaded(wq))) cpu = singlethread_cpu; BUG_ON(!list_empty(&work->entry)); __queue_work(per_cpu_ptr(wq->cpu_wq, cpu), work); ret = 1; } put_cpu(); return ret; } {226} 6th paragraph, 2nd sentence; The referenced sentence suggests using readb, yet later on page 251 it says readb is discouraged and one should use ioread8 instead. Suggest changing readb on p266 to ioread8. {226} 6th paragraph; Text refers to "readb" function - however on page 251, use of readb is described as being 'discouraged', so text should refer to "ioread8" function (p251) instead. (228) last sentence; "it would not do for a processor..." should read "it would not do for a process..." {270} 3rd full paragraph code example; The code *index = (new >= (short_buffer + PAGE_SIZE)) ? short_buffer : new; should read *index = (new >= (short_buffer + PAGE_SIZE)) ? (short_buffer + (new - short_buffer) % PAGE_SIZE): new; The current code works since delta is always 16 but if someone was to try to use this code in a situation where delta took on other values then index might not be correct when the wrap around occurred. [270] short_interrupt(), line 9; Allocation of short_buffer (from the example files): ... short_buffer = __get_free_pages(GFP_KERNEL,0); short_head = short_tail = short_buffer; ... Using short_buffer: ... /* Write a 16 byte record. Assume PAGE_SIZE is a multiple of 16 */ written = sprintf((char *)short_head,"%08u.%06u\n", (int)(tv.tv_sec % 100000000), (int)(tv.tv_usec)); BUG_ON(written != 16); ... When the last entry is written to the buffer (e.g. the 256th if PAGE_SIZE is 4096), sprintf() will write the zero termination character "behind" the buffer. So a variable stored here will be overwritten with a '\0'. {281} 2nd paragraph, 2nd line; "IRQ5 is used for the serial ATA and IEEE1394 controllers;" should read: "IRQ5 is used for the serial ATA and USB2 controllers;" (312) under section void (*remove) (struct pci_dev *dev);; You mentioned that more details about the remove function would be discussed later in this chapter. I do not see any details regarding the remove function in the chapter. [314, 326] Enabling the PCI Device, Quick Reference; You explain the function pci_enable_device. Shouldn't the function pci_disable_device also be shown for completeness? [317, 326] various, Quick Reference; You explain the functions pci_resource_start, pci_resource_end and pci_resource_flags. Shouldn't the function pci_resource_len also be there? Especially since the functions ioremap, ioremap_nocache and io_portmap require the start and length, but not the end. [317] PCI interrupts section; In recent kernels (at least as recent as 2.6.16) PCI interrupts are handled differently than described in this section. As far as I can tell the changes are to do ACPI. In the 2.6.16 kernel you can't get an irq number that works by using. pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &my_irq); You have to use the following: my_irq = dev->irq. A comment from /usr/src/linux/include/linux/pci.h: struct pci_dev { ... ... /* * Instead of touching interrupt line and base address registers * directly, use the values stored here. They might be different! */ unsigned int irq; ... ... }; [317] last paragraph; The section "PCI Interrupts" describes how to hook up a pci interrupt by reading the value programmed into the pci configuration header at PCI_INTERRUPT_LINE, and calling request_irq() with this value as the first argument. This is incorrect. Consider the following comment from linux-2.6.10/include/linux/pci.h line 523: /* * Instead of touching interrupt line and base address registers * directly, use the values stored here. They might be different! */ unsigned int irq; struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */ The comment states that in a similar manner to the base address registers, one should not access PCI_INTERRUPT_LINE directly but instead use the value of irq from struct pci_dev. This is demonstrated in linux-2.6.10/drivers/net/e100.c line 1688, for example. [341] int interval paragraph; "[...] milliseconds. For devices [...]" should read: "[...] milliseconds. For high-speed devices [...]" (355) last paragraph; "...so it should do any memory..." should read: "...so it should not do any memory..." (403) first paragraph after udev heading; The text reads "this has previously been done in user space" It should read "this has previously been done in kernel space" [429] programming example under "Remapping Specific I/O Regions"; "unsigned long psize = simple_region_size - off;" should be: "unsigned long psize = simple_region_start + simple_region_size - off);" {438} Top two code samples; Beginning (at least) with 2.6.18 the second parameter to aio_read and aio_write is an iovec pointer instead of a pointer to a char buffer with the third parameter, count, indicating the number of iovecs. This issue, of course, affects all of "An Asynchronous I/O example" on pages 439 and 440. As an aside, this affects building the sample scullX drivers on fedora core 5 and 6 machines. Also note that main.c in those example drivers includes . With Fedora core 6 you don't need full source to develop and test modules, just the kernel-devel set. The Fedora core 6 system no longer includes kernel source and recommends building external modules with just kernel-devel. is not present, at least on my Fedora core 6 system with kernel-devel loaded. It does not seem to be needed (if you ignore compile warnings that the file doesn't exist) to build scullc etc. My builds of scullc, etc., on an older Fedora Core system with source present succeed as shipped since this is a 2.14 system and aio_read|write uses a char pointer for the second parameter. [488] definition of sbull_xfre_bio() routine; There are two problems with this routine. First problem is obviously a typo: the line __bio_kunmap_atomic(bio,KM_USER0); should be __bio_kunmap_atomic(buffer,KM_USER0); as described on page 483. Second problem is that sbull_xfer_bio() calls bio_cur_sectors(bio) as it goes through the loop. The bio_cur_sectors(bio) is defined in as bio->bi_io_vec[bio->bi_idx]. Since your routine never increments bio->bi_idx, it will work only if all segments of this 'bio' have exactly the same size; it is unable to handle the case when different segments have different sizes. I think this problem can be solved by replacing calls to bio_cur_sectors(bio) with (bvec->bv_len >> KERNEL_SECTOR_SIZE). [556] tiny_write() sample code; The paragraph above says that you're not supposed to sleep in write() callback, since it can be called from interrupt context, but the sample code for tiny_write uses down(). {559} 2nd, 3rd and 5th paragraphs; As of 2.6.15 tty flip buffers have now changed. See Alan Cox comment below. Alan Cox patch 2.6.15-rc1 This replaces the tty flip buffers with kmalloc objects in rings. In the normal situation for an IRQ driven serial port at typical speeds the behaviour is pretty much the same, two buffers end up allocated and the kernel cycles between them as before. When there are delays or at high speed we now behave far better as the buffer pool can grow a bit rather than lose characters. This also means that we can operate at higher speeds reliably. The following might be a suitable substitute for the example code for (i = 0; i < data_size; ++i) { if (tty_request_buffer_room(tty, 1) == 0) break; tty_insert_flip_char(tty, data[i], TTY_NORMAL); } tty_schedule_flip(tty); [sample code] in file scull/pipe.c, scull_p_open method; Every invocation of the scull_p_open method will reset the read and write pointers (rp, wp) to buffer beginning. Doing so will cause a deadlock by putting the reader and writer in an infinite sleep in some situations. *To reproduce the bug: - cat "file_BIGGER_than_buffer_size" > /dev/scullpipe0 - "cat" will sleep cause the buffer is full - cat /dev/scullpipe0 - Buffer pointers will be reset by the statement dev->rp=dev->wp=dev->buffer. cat will sleep thinking that the buffer is empty (dev->rp == dev->wp) - reader and writer will infinitly sleep * Solution: Buffer pointers shouldn't be reset if any process opened the device before us (and possibly changed rp or wp). Check could be done using the reference counters nreaders and nwriters. --- pipe.c 2006-12-22 04:21:23.000000000 +0200 +++ pipe.c 2006-12-22 06:26:24.000000000 +0200 @@ -75,9 +75,11 @@ return -ENOMEM; } } - dev->buffersize = scull_p_buffer; - dev->end = dev->buffer + dev->buffersize; - dev->rp = dev->wp = dev->buffer; /* rd and wr from the beginning */ + if (dev->nreaders == 0 && dev->nwriters == 0) { + dev->buffersize = scull_p_buffer; + dev->end = dev->buffer + dev->buffersize; + dev->rp = dev->wp = dev->buffer; /* rd and wr from the beginning */ + }