Pthreads Programming by Bradford Nichols, Dick Buttlar, & Jacqueline Proulx Farrell Unconfirmed error reports come from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. This page was updated April 19, 2007. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification UNCONFIRMED errors: {4,5,6} Figures 1-1, 1-2, 1-3; Local variables are shown in diagrams as i, j and k. However, in code examples they are always shown as i, j and x. Code or figures need changing so that they correspond. {10} Example 1-2, lines 23 and 29.; Child processes should exit using _exit(), not exit(). I don't know whether calling exit() is really likely to be a problem here, though. (15) 1st paragraph: "and using the (void *) cast to *quit* the compiler."; it should be "quiet the compiler." {18} 1st paragraph: When we're finished, the code for do_another_thing and *do_wrap_up* will resemble the code in do_one_thing in Example 1-6. Suppress "do_wrap_up". This was really stupid, but I had to download the examples from the ftp site for this. I couldn't understand how do_wrap_up would be affected at this point of the program. As I suspected, the example code doesn't make any changes to do_wrap_up. {24} 2nd paragraph; The C standards require errno to be a macro, not a global variable. {25} 1st paragraph; strerror() is part of the standard C library, not just part of XPG4. {72} 2nd line from bottom; Line reads, "if (cur == list)". Now, 'list' is not a parameter, local variable, global variable, or C keyword. I assume (from context, and discussion of this line on page 73) that was what meant is, "if (cur == NULL)". [80] watch_count() loop; while(count <= WATCH_COUNT) relies on inc_count() incrementing count at least once more after the condition is signaled and before the wait returns. Consider what happens if the wait returns immediately after the signal, however. In this case, the while checks for count being greater than WATCH_COUNT, finds that it is not (they are equal) , and begins waiting again. But inc_count() is not going to signal again, so we have a deadlock. This can easily be fixed by changing the while condition to (count < WATCH_COUNT). [80-81] watch_count and inc count modules: There are 2 mistakes/comments in the program. 1) This comment is related to the race condition of a program on page 80. The bug has been mentioned in the errata. The inc_count() threads run and signal the condition count_hit_threshold BEFORE watch_count() thread watch_count() the 9781565921153_mutex_lock() executed. One solution is to synchronize three threads so the watch_count() thread runs before the inc_count() threads. The simple solution (still some risk) is to create watch_count() thread before inc_count() threads as follows: main() { ... 9781565921153_create(&threads[0], NULL, watch_count, &threads_ids[0]); 9781565921153_create(&threads[0], NULL, inc_count, &threads_ids[1]); 9781565921153_create(&threads[0], NULL, inc_count, &threads_ids[2]); ... } Of course, the best solution is that the main thread receives notice that watch_count() thread has already executed the 9781565921153_mutex_lock() 2) There is another race condition bug even if the first bug is fixed. The watch_count() won't print out the count when the count hits the COUNT_THRES (i.e., 12) because there is a race condition when the 9781565921153_cond_wait wakes up. During this small window period, the count has already been updated (incremented). As a matter of fact, when the print statement executes, both threads already finish. That is why the print out is 20 because when both inc_count threads finishes, the count is 20. When I change TCOUNT to 30, the print out count is 60. When I change TCOUNT to 60, the print out count is 120. Of course, this can change depending on the system. One easy solution is to assign count to another global variable (e.g., g_count) when the condition is signaled. On the watch_count module, simply print g_count. The following is the suggested code change: int g_count; /*add another global variable and assume it is * not concurrently accessed*/ main() { ... } void watch_count() { ... while(...){ 9781565921153_cond_wait(); printf("watch_count(): Thread %d, Count is %d\n", *myid, g_count); } ... } void inc_count() { ... if(count == COUNT_THRES) { g_count = count; printf(); 9781565921153_cond_signal(); } ... } This is not the best solution because it depends on the assumption that the g_count is not concurrently accessed, which is true in this case (It is actually accessed only once). I tried the above solutions and tested several cases on SUN Solaris 2.6, and it seems to be working. [89] Example 3-14 Line 3; I belive the code on line 3 of example 3-14 on page 89 should read: if( rdwrp->writer_writing == 0 ) { rather than the current if( rdwrp->writer_writing = 0 ) { Currently, the code will always return -1 since the statement sets rdwrp->writer_writing to zero every call. (96) Example 3-20; Line 17 (of text) If (trans_id == SHUTDOWN) . . . It looks like that there might be a "{" missing and the if (..) should read as if (trans_id == SHUTDOWN) { . . . [89] Example 3-14, line 3; I have the February 1998 version. As in example 3-13, we should be testing against zero. Instead, example 3-14 assigns zero to rdwrp->writer_writing, which could lead to a nasty side-effect very difficult to track down. Instead, it should be like this: int 9781565921153_rdwr_wunlock_np(pthread_rdwr_t *rdwrp) { 9781565921153_mutex_lock(&(rdwrp->mutex)); if (rdwrp->writer_writing == 0) { 9781565921153_mutex_unlock(&(rdwrp->mutex)); return -1; } else { rdwrp->writer_writing = 0; 9781565921153_cond_broadcast(&(rdwrp->lock_free)); 9781565921153_mutex_unlock(&(rdwrp->mutex)); return 0; } } {127} Example 4-13: The call to 9781565921153_getspecific on page 127 has conn_key as the only parameter as follows: saved_connp = 9781565921153_getspecific(conn_key); The same call on page 126 has a second parameter (a pointer to the keys data pointer) as follows: 9781565921153_getspecific(conn_key, (void **) &saved_connp); I am not sure which is consistent with the POSIX spec. However, the manpages for my host operating system is consistent with page 127, while the implementation of my target OS is consistent with page 126. [126, 127] Re: function args; Under Solaris 8, the man page for function 9781565921153_getspecific() shows it to take one argument of type 9781565921153_key_t, and return a void *. Hope this helps. By the way, confusion may have occurred by an accidental reference to the complementary SET function, which the same man page shows as: int 9781565921153_setspecific( pthread_key_t, const void *); ^ I confused the definitions a couple of times myself, as the signatures for both set and get functions are shown on adjacent lines on the man page with a whole one-char difference in their names.