lex & yacc, 2nd Edition by John R. Levine, Tony Mason, and Doug Brown Unconfirmed errors are from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. Send error reports and technical questions to booktech@oreilly.com. This page was last modified on March 23, 2007. Here's the key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification UNCONFIRMED errors: (xx) 4th paragraph, last sentence; Abraxas' version lex and yacc should be: Abraxas' version of lex and yacc (8) third printf statement on page: below is listed twice in the list of adverb patterns: ... | below | between | below {printf ("%s: is a preposition\n", yytext);} [17] Example 1-7; I'm using AIX 4.3 version of lex and yacc. The example doesn't work properly if I don't put return statement in sentence rule. sentence: subject VERB object { printf("Sentence is valid.\n"); return 0; } (occurs under FreeBSD, too) [23] code Example 1-9; The C code does not have the same behavior as the lex code in Example 1-10, as claimed. In particular, try the following two test cases as input: .123.456 "Here is an unclosed literal text string The lex code treats the first example as two NUMBERs: .123 and .456, while the C code has improper control flow and treats it as a single number. By the time the C code gets to the second decimal point, it has no memory of whether it started with .123 or just 123 without a leading decimal point. The lex code correctly detects that the second example does not fit any of its patterns except the default, so it outputs the double quote and then continues. The remaining words are then categorized as COMMANDs. The C code makes the erroneous decision that this must be TEXT on the basis of the double-quote character, even though the quote might never be closed later. The C code could really stand to be cleaned up quite a bit. The indentation in the NUMBER case does not match the nesting level, making it quite hard to read, and the "early K&R hacker" style of the beginning of the code is obsolete (e.g. use "int main(int argc, ..)" instead of the "void main(argc, argv)" K&R style shown.) {30-31} varies; The regular expression used to match decimal numbers does not handle the following case: 2. Now admittedly, one could argue that one should not put a decimal point without a subsequent number, but the above will compile as valid 'C' and will be correctly read into a scanf() or sscanf() call. The problem is that the leading digits -or- trailing digits are optional, but not both. At least leading digits or trailing digits are required. The following regular expression is more complete: -?(([0-9]+\.?)|([0-9]*\.[0-9]+)) ^^^ and will match the following: 1 1. 1.0 .01 -1 -1. -1.0 -.01 This issue seems to be repeated whenever you are matching a number. {31} 3rd line from bottom; I believe the regular expression for matching a quoted string which doesn't go over one line, which is listed as this \"[^"\n]*["\n] should be \"[^"\n]*\" The expression in the book incorrectly matches a string with no terminating " on the same line as a string, eg "this is not a string would match because \n is allowed to terminate the pattern. My version prevents newlines in strings, but properly fails if the string is not terminated before line end. [44] line 6 of Example 2-7: The following line: .+ ECHO; has been left out. This will prevent matching of other strings when not in the MAGIC start state. {45} 4th paragraph from bottom - 3rd chunk of example code (out of 4 chunks), 2nd line of code; The code reads: ^[ \t]*\n \n /* whitespace lines matched by previous rule */ . /* anything else */ I think the comment on line two should instead read: /* whitespace lines NOT matched by previous rule */ {47} Example 2-9; The result of running example 2-9 to: some code /* comment */ /* tricky */ results in code: 3, comments 0, white space 0 while it should be code: 1, comments 2, white space 0 The problems are with the expressions and actions: .+"/*".*"*/".*\n { code++; } and .*"/*".*"*/".+\n { code++; } because they can inadvertently eat an opening comment /* just before the new line, without switching to the COMMENT state. {58-65} code ch3-01.y, ch3-02.y, ch3-03.y; Yacc codes ch3-01.y, ch3-02.y, ch3-03.y, all miss "main()" function so that nobody calls yyparse(), eventually none of the examples runs correctly. I tried to add following codes in yacc codes: %% main() { yyparse(); } After that all examples are running correctly. [58-59] example ch3-01.* code and compile and output instructions on both pages; Example ch3-01 does NOT work. I have tried this on a Win PC, SCO Unix, Redhat 7.2 Linux and Suse7.3 Linux. Using both hand-typed code, as well as using the examples from the ftp site, at NO time does the { printf("= %d\n",%1);} ever display on the screen as it is supposed to. (132) Example 5-17 scalar_exp_commalist is listed twice. The second one can be omitted. {136} top of page: The production for search_condition allows for an empty production. The bar (|) before "search_condition OR search condition" should be removed. The same error appears on page 316 in the appendix and in the electronic code file. (204) Last line of the second examle; I think that the line: { yyval.opval = .............. should be read: { yylval.opval = .............. "yylval" instead of "yyval" {246} 2nd code example: The trick to save the input line at a time: \n.* { strcpy(linebuf,yytext+1); lineno++; yyless(1); } fails to save the first input line of the yyin buffer. [286] in the middle; In the example "E-1: Flex specification to parse a command line ape-05.l" in page 285, at the end of the example we can see the following part of code: if (targv[0][offset+copylen] == '\0') { /* end of arg */ buf[copylen] = ' '; copylen++; offset = 0; targv++; } The mistake is that here the case in which the length of an argument is longer than max (that is, the maximum number of characters that can be copied into the buffer at one time) is not supported. In this case, we can't copy the full argument into the buffer, but we can only copy a part of the argument. Then, offset should be updated so as to store the point where we left copying the argument into the buffer. This is possible by adding to the if-sentence: else offset += copylen;