The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
Printed, PDF |
Page ?
? |
The book text makes use of an employees data set.
I find no reference to a set of data that can be used with the book to run through the book examples as working exercises.
Do you have a data set to download for use with the book?
Did I miss it in the text?
I have the stock data, I'm referring to the Employee data you work with but don't ever tell us where to get it so we can work along with you on the examples.
Before creating my own data set, which is tedious, thought I'd ask you if you have one already constructed and available.
Note from the Author or Editor: I'll prepare a zip file of "extras" like this, which is a reasonable request.
|
Anonymous |
Oct 30, 2012 |
Apr 03, 2015 |
PDF |
Page chapter 6
2nd paragraph |
Where can I find Employee table data?
I did not find in your website (what you posted link in errata).
Note from the Author or Editor: This link has the data: http://cdn.oreillystatic.com/oreilly/examples/0636920023555/prog-hive-1st-ed-data.zip
(It's on the book's page: http://shop.oreilly.com/product/0636920023555.do)
|
Anonymous |
May 18, 2015 |
|
Printed, PDF, ePub |
Page 5
United States |
In figure 1-1, there are several references to the word 'ie', but given the input, the word/token should be 'is'.
Note from the Author or Editor: Yes, this is correct. The figure is incorrect.
|
Bill Bates |
Oct 11, 2012 |
Apr 03, 2015 |
Printed, PDF, ePub |
Page 5
United States |
In figure 1.1, in the reducers, the token/key 'there' should have a value of '[1,1]' and the token/key 'uses' should have the value '[1]'. It appears these two values have been mixed up.
Note from the Author or Editor: Yes, this is correct. The figure is incorrect.
|
Bill Bates |
Oct 11, 2012 |
Apr 03, 2015 |
PDF |
Page 12
Hive sample |
'\s' SHOULD be '\\s'
Note from the Author or Editor: Correct.
|
Tatsuo Kawasaki |
Apr 20, 2013 |
Apr 03, 2015 |
|
Page 31
1st paragraph |
The chapter says: "Using DISTRIBUTE BY ... SORT BY or the shorthand CLUSTER BY clauses is a way to exploit the parallelism of SORT BY, yet achieve a total ordering across the output files."
However, I do not believe this is true. We only get sorted runs, which still need to be merged in a postprocessing step.
Note from the Author or Editor: Rereading the section and a few previous sections, I think what we were trying to say is you can get a total ordering in special cases by exploiting known ordering that already exists in the data, even when using SORT BY and not ORDER BY, which would require a final pass through one reducer no matter how many records are involved.
You would end up with reducer output blocks that don't have to be resorted together, and I think you subsequently ran SELECT * on the output of this query, you would see a total ordering.
However, this is not at all clear from the text, which wrongly implies total ordering occurs any time you use CLUSTER BY. I don't know if any more bug-fix editions or printings will be made to this relatively old book, but if so, I would replace this whole paragraph with this:
"Using CLUSTER BY is a shorthand for DISTRIBUTE BY ... SORT BY, when the columns for both clauses are the same."
Also, I noticed "shor-hand" in the first paragraph of the CLUSTER BY section, so it should be fixed, too.
|
Stefanie Scherzinger |
Jan 18, 2024 |
|
Printed, PDF |
Page 34
Line 10 to 14 |
It says:
hive> set env:HOME;
env:HOME=/home/yourusername
Which is correct, and again on the next line it is saying the same thing but different output (which is not possible):
hive> set env:HOME;
env:* variables can not be set.
I think what the author intent to do here is to demonstrate that setting environment variable (env.HOME) results in error i.e.:
hive> set env:HOME=/abcd;
env:* variables can not be set.
Note from the Author or Editor: Ah yes. Correct. The suggested change is fine.
|
Anonymous |
Oct 16, 2014 |
Apr 03, 2015 |
Printed, PDF, ePub |
Page 42
Table of types and 4th paragraph after the table |
Our description of the new TIMESTAMP type followed the specification for the feature that was planned, but the implemented feature doesn't support all 3 formats listed. Instead, it only supports the UTC string format: "YYYY-MM-DD HH:MM:SS.FFFFFFFFF".
So, the "Literal Syntax Example" cell in the table should contain just this text: "'2012-02-03 12:34:56.123456789' (JDBC- compliant java.sql.Timestamp format)".
The 4th paragraph after the table, should be modified to say this, although we could omit the parenthetical "Note":
Values of the new TIMESTAMP type must be strings that follow the JDBC date string format convention, YYYY-MM-DD hh:mm:ss.fffffffff. (Note: when support for TIMESTAMP was under development, support for integer and float literals was planned, where they would be interpreted as seconds and seconds plus nanoseconds, respectively, since the Unix epoch time. These formats were not implemented.)
|
Dean Wampler |
Oct 21, 2012 |
Apr 03, 2015 |
PDF |
Page 44
4th paragraph |
"and the key would either be a percentage"
should be
"and the value would either be a percentage"
Note from the Author or Editor: Correct. Should be the word "value".
|
Nick Wilson |
Feb 05, 2014 |
Apr 03, 2015 |
PDF |
Page 53
CREATE TABLE statement |
It seems LOCATION should be put before TBLPROPERTIES.
READ:
:
TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...)
LOCATION '/user/hive/warehouse/mydb.db/employees';
SHOULD READ:
:
LOCATION '/user/hive/warehouse/mydb.db/employees'
TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...);
Note from the Author or Editor: Yes, this is correct.
|
Tatsuo Kawasaki |
Apr 21, 2013 |
Apr 03, 2015 |
PDF |
Page 68, 122, 123, 127
|
Data Type 'LONG' is not supported in Hive. It should be �BIGINT'.
Note from the Author or Editor: OOPS! Yes, these LONGs should be BIGINTs in the Hive queries.
|
Tatsuo Kawasaki |
Apr 21, 2013 |
Apr 03, 2015 |
PDF |
Page 71
1 |
The book text makes use of an employees data set.
I find no reference to a set of data that can be used with the book to run through the book examples as working exercises.
Do you have a data set to download for use with the book?
Did I miss it in the text?
I have the stock data, I'm referring to the Employee data you work with but don't ever tell us where to get it so we can work along with you on the examples.
Before creating my own data set, which is tedious, thought I'd ask you if you have one already constructed and available.
Note from the Author or Editor: I have posted a small example file here: http://polyglotprogramming.com/employees.txt
It uses the default delimiters for Hive, so just create the table as described in the book and drop this file in the appropriate HDFS directory.
|
Anonymous |
Feb 21, 2013 |
Apr 03, 2015 |
PDF |
Page 72
5th paragraph |
"If you specify the OVERWRITE keyword, any data already present in the target directory will be deleted first. Without the keyword, the new files are simply added to the target directory. However, if files already exist in the target directory that match filenames being loaded, the old files are overwritten.
Versions of Hive before v0.9.0 had the following bug: when the OVER WRITE keyword was not used, an existing data file in the target directory would be overwritten if its name matched the name of a data file being written to the directory. Hence, data would be lost. This bug was fixed in the v0.9.0 release."
There is a discrepancy as both imply the same - before and after the v0.9.0/bug-fix.
Note from the Author or Editor: I think what's confusing is the fact that the callout says the same thing, in more detail, as the last sentence of the previous paragraph. So, I suggest deleting the sentence at the end of the paragraph. I.e., delete "However, if files already exist.. overwritten."
|
Ramesh R N |
Jun 10, 2013 |
Apr 03, 2015 |
PDF |
Page 75
First full paragraph, following table at top of page |
This sentence has a minor grammatical error:
"So, for example, our first example using dynamic partitioning for all partitions might actually look this, where..."
I think the word "like" was left out, and should probably be this instead:
"So, for example, our first example using dynamic partitioning for all partitions might actually look like this, where..."
Note from the Author or Editor: Yes, the word "like" should be there.
|
Tom Wheeler |
Aug 06, 2013 |
Apr 03, 2015 |
PDF |
Page 75
Last code snippet in page |
In 'CREATE TABLE ca_employees' code snippet, should the
WHERE se.state = 'CA'
clause be
WHERE state= 'CA'
instead (i.e., without the 'se' prefix, which is not defined in the code snippet)?
Note from the Author or Editor: Correct. Remove the "se." from the clause as described
|
ctsats |
Mar 20, 2015 |
Apr 03, 2015 |
PDF |
Page 103
First paragraph |
Towards the top of page 103 it says, "The partition filters are ignored for OUTER JOINTS" but the last word should instead be "JOINS"
Note from the Author or Editor: Unless we're smoking something ;)
Correct. should be "JOINS".
|
Tom Wheeler |
Aug 06, 2013 |
Apr 03, 2015 |
PDF |
Page 114
1st and 2nd lines of data listing, end of page |
There are two lines that end with same string (and both cross line breaks):
...^Btarry- town^Apart\^Bmuffler
The \ before the last ^B should not be there in both lines.
|
Dean Wampler |
Mar 23, 2015 |
Apr 03, 2015 |
Printed, PDF |
Page 115
18th Line |
In the below example:
CREATE VIEW shipments(time, part) AS
SELECT cols["time"], cols["parts"]
FROM dynamictable
WHERE cols["type"] = "response";
The cols["parts"] should cols["part"] i.e. it should be part and not parts.
Note from the Author or Editor: Yes, should be 'cols["parts"]'.
|
Anonymous |
Dec 23, 2014 |
Apr 03, 2015 |
Printed |
Page 118
The two CREATE INDEX examples |
In the two CREATE INDEX examples, the index is created on only the country column as stated in the preceding text. The comment in the HQL, however, in incorrect in stating "Employees indexed by country and name."
According to https://cwiki.apache.org/Hive/indexdev.html, "PARTITIONED BY clause may be used to specify a subset of the table's partitioning columns.". Not sure if the Hive Wiki is incorrect or your book is incorrect. Regardless, please clarify.
On the other hand, in the HQL, perhaps you meant "CREATE INDEX ... ON (country, name) ... PARTITIONED BY (country)"?
Note from the Author or Editor: Both examples are somewhat nonsensical, in that they should index by (country, name) and just partition by (country). If you partitioned by employee name, you would have very tiny files!!
So, the following should be the code at the top of page 118:
CREATE INDEX employees_index
ON TABLE employees (country, name)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD
IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time')
IN TABLE employees_index_table
PARTITIONED BY (country)
COMMENT 'Employees indexed by country and name.';
The second example on 118 should be:
CREATE INDEX employees_index
ON TABLE employees (country, name)
AS 'BITMAP'
WITH DEFERRED REBUILD
IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time') IN TABLE employees_index_table
PARTITIONED BY (country)
COMMENT 'Employees indexed by country and name.';
|
Anonymous |
Apr 29, 2013 |
Apr 03, 2015 |
PDF |
Page 119
Section "Rebuilding an Index" (starts on previous page) |
The statement:
ALTER INDEX employees_index ON TABLE employees PARTITION (country = 'US') REBUILD;
should not have the word TABLE. It should be
ALTER INDEX employees_index ON employees PARTITION (country = 'US') REBUILD;
Also, later on the page, the same error occurs in a different statement:
DROP INDEX IF EXISTS employees_index ON TABLE employees;
should be
DROP INDEX IF EXISTS employees_index ON employees;
|
Dean Wampler |
Feb 23, 2013 |
Apr 03, 2015 |
PDF |
Page 119
Below "Dropping an Index" heading |
The book suggests the following command for dropping an index:
DROP INDEX IF EXISTS employees_index ON TABLE employees;
However, Hive 0.10 gives a ParseException ("extraneous input 'TABLE' expecting Identifier near '<EOF>'") in response.
It seems that the TABLE keyword is not allowed here, so the command should be:
DROP INDEX IF EXISTS employees_index ON employees;
Note from the Author or Editor: Correct. This might be a SQL language change since the book was written, or it's just a dumb mistake ;)
|
Tom Wheeler |
Jun 24, 2013 |
Apr 03, 2015 |
PDF |
Page 135
In "Optimized Join" section |
/* streamtable(table_name) */ should be
/*+ streamtable(table_name) */
|
Dean Wampler |
Feb 23, 2013 |
Apr 03, 2015 |
PDF |
Page 136
Third 'set' statement |
The local execution mode example shows setting a property named 'mapred.tmp.dir' but I don't believe Hadoop (at least widely-used versions) use a property with this name.
There's a property named 'mapred.temp.dir', though the original JIRA (https://issues.apache.org/jira/browse/HIVE-1408) describes setting the 'mapred.local.dir' property instead (which is what I do).
Note from the Author or Editor: It should be hadoop.tmp.dir, not mapred.tmp.dir
|
Tom Wheeler |
Jun 03, 2013 |
Apr 03, 2015 |
PDF |
Page 167
Code sample for UDFZodiacSign class |
This code uses the @Description annotation, but the class isn't imported. The code does not compile as written, but this problem can be fixed by adding a line at the top of page 167:
import org.apache.hadoop.hive.ql.exec.Description;
Note from the Author or Editor: Yes, the code should have the line:
import org.apache.hadoop.hive.ql.exec.Description;
|
Tom Wheeler |
Jun 07, 2013 |
Apr 03, 2015 |
Printed |
Page 179
3rd paragraph |
Hi,
This is a very minor typo:
The third paragraph starts as " The benefit of this type of UDFT...." where as it should be
" The benefit of this type of UDTF...." where UDTF stands for User-Defined Table Generating Function.
Regards,
Ramki.
Note from the Author or Editor: OOPS! Correct. Should be UDTF
|
Ramki Palle |
Feb 11, 2013 |
Apr 03, 2015 |
Printed |
Page 185
2nd section |
Which version of Hive implements "CREATE TEMPORARY MACRO"? I looked for more information online and all I could find was a proposed patch on a still-open ticket:
https://issues.apache.org/jira/browse/HIVE-2655
Note from the Author or Editor: This is a bit embarrassing; it appears that the feature isn't actually in any Hive release, even the latest 0.10.0.
The text should be amended to say this is a planned feature that may appear in a release soon.
|
Terran Melconian |
Nov 12, 2012 |
Apr 03, 2015 |
PDF |
Page 188
Under the heading "Identity Transform" |
"SELECT TRANSFORM (a, b)"
should be
"SELECT TRANSFORM (col1, col2)"
Note from the Author or Editor: Correct.
|
Nick Wilson |
Feb 10, 2014 |
Apr 03, 2015 |
PDF |
Page 199
2nd paragraph |
"Hive draws a clear distinction between the file format, how records are encoded in a file, the record format, and how the stream of bytes for a given record are encoded in the record."
should be
"Hive draws a clear distinction between the file format, how records are encoded in a file and the record format,how the stream of bytes for a given record are encoded in the record."
Note from the Author or Editor: Yes, the "and" is misplaced. This is the correct wording:
Hive draws a clear distinction between the file format, how records are encoded in a file, and the record format, how the stream of bytes for a given record are encoded in the record.
|
Ramesh R N |
Jun 09, 2013 |
Apr 03, 2015 |
PDF |
Page 209
7th line |
"Avro is a serialization systemit�s main feature" there appears to be at least a comma and space missing between "system" and "it's" (and there shouldn't be an apostrophe).
Note from the Author or Editor: Yes, should be "... system. It's ..."
|
peter marron |
Mar 20, 2013 |
Apr 03, 2015 |
Printed |
Page 217
java code to check "bad" table names |
Hi,
I understand that the intention of the code is to identify the external tables whose data reside inside the warehouse directory, which is /user/hive/warehouse.
The if clause code is there as:
if (t.getTableType().equals("MANAGED_TABLE") &&
! u.getPath()contains("/user/hive/warehouse") ) {
System.out.println (t.getTableName()
+ " is a non external table mounted inside /user/hive/warehouse" );
bad.add (t.getTableName());
}
There are two issues here:
1. The check inside the if clause.
2. The message in the println statement.
The code should be
if (! t.getTableType().equals("MANAGED_TABLE") &&
u.getPath()contains("/user/hive/warehouse") ) {
System.out.println (t.getTableName()
+ " is an external table mounted inside /user/hive/warehouse" );
bad.add (t.getTableName());
}
or
if (t.getTableType().equals("EXTERNAL") &&
u.getPath()contains("/user/hive/warehouse") ) {
System.out.println (t.getTableName()
+ " is an external table mounted inside /user/hive/warehouse" );
bad.add (t.getTableName());
}
Regards,
Ramki.
Note from the Author or Editor: UPDATE: Missing "." in one expression:
I think the second proposed alternative is better:
if (t.getTableType().equals("EXTERNAL") &&
u.getPath().contains("/user/hive/warehouse") ) {
System.out.println (t.getTableName()
+ " is an external table mounted inside /user/hive/warehouse" );
bad.add (t.getTableName());
}
|
Ramki Palle |
Feb 11, 2013 |
Apr 03, 2015 |
PDF |
Page 221
3rd Paragraph |
The 3rd paragraph appears to be missing at least one line. It trails off with "however it could output". It could output what?
Note from the Author or Editor: OOPS! I'm not sure what we intended for the missing part of the sentence, but let's just write "..., however it could output to text files."
|
Peter Marron |
Mar 20, 2013 |
Apr 03, 2015 |
PDF |
Page 237
first paragraph |
<value>zk1.site.pvt,zk1.site.pvt,zk1.site.pvt</value>
should it be
<value>zk1.site.pvt,zk2.site.pvt,zk3.site.pvt</value>
Note from the Author or Editor: Correct.
|
xiaobogu |
Dec 04, 2014 |
Apr 03, 2015 |
PDF |
Page 283
3rd paragraph, in the source code [SUM] |
To determine if a pageview is an origin, we'd sum the pageviews preceding this pageview. Therefore, the SUM statement should be:
SUM(IF(b.b_timestamp + 1800 >= a.a_timestamp AND
b.b_timestamp < a.a_timestamp,1,0)) AS c_nonorigin_flags
Notice that 'a' and 'b' are exchanged.
Besides, in the end of this query, the WHERE clause should be
WHERE
c.c_nonorigin_flags = 0
since 'c_nonorigin_flags' is an integer (the result of a SUM).
Note from the Author or Editor: Both suggesttions are correct. These changes are actually on 284 in the PDF. The WHERE clause is at the end of the long query that starts on 283 and ends on 284. The SUM(IF(...)) is the code after the big query ends on page 284.
|
Zheyi Rong |
Jul 25, 2013 |
Apr 03, 2015 |
Printed |
Page 292
On the page |
"PostGreSQL" should probably be changed to "PostgreSQL"
Note from the Author or Editor: Whoops. Yes.
|
David Haguenauer |
Mar 20, 2014 |
Apr 03, 2015 |
Printed |
Page 293
On the page |
"PostGreSQL" should probably be changed to "PostgreSQL"
Note from the Author or Editor: Correct
|
David Haguenauer |
Mar 20, 2014 |
Apr 03, 2015 |
PDF |
Page 300
Middle of page |
READ: a number of segments separated by /r/n (carriage..
SHOULD READ: a number of segments separated by \r\n (carriage..
Note from the Author or Editor: Correct. This is a typo.
|
Tatsuo Kawasaki |
Apr 14, 2013 |
Apr 03, 2015 |
ePub |
Page 18013
location 18013 in Kindle version (section on HBase) |
The "CREATE TABLE" statements that are listed for HBase are incorrect. Both the internal and external DDL that is listed contain only a single non-key column mapping in the hbase.columns.mapping SERDEPROPERTIES, while they use two non-key columns in the actual table creation statement. Both of these create table statements are invalid.
The DDL statements can be corrected by either adding an additional column mapping, or reducing the number of columns defined in table definition.
Note from the Author or Editor: Correct. Should be:
CREATE TABLE hbase_stocks(key INT, name STRING, price FLOAT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,stock:val,price:val") TBLPROPERTIES ("hbase.table.name" = "stocks");
(I added ,"price:val" to the SERDEPROPERTIES clause.
|
Gabriel Reid |
Feb 10, 2013 |
Apr 03, 2015 |