Grunt[4] is Pig’s interactive shell. It enables users to enter Pig Latin interactively and provides a shell for users to interact with HDFS.
To enter Grunt, invoke Pig with no script or command to run. Typing:
pig -x local
grunt>
This gives you a Grunt shell to interact with your
local filesystem. If you omit the -x local
and have a
cluster configuration set in PIG_CLASSPATH
, this will put you
in a Grunt shell that will interact with HDFS on your cluster.
As you would expect with a shell, Grunt provides
command-line history and editing, as well as Tab completion. It does not provide filename completion via
the Tab key. That is, if you type kil
and then press the
Tab key, it will complete the command as kill
. But if you
have a file foo in your local directory
and type ls fo
, and then hit Tab, it will not complete it
as ls foo
. This is because the response time from HDFS to
connect and find whether the file exists is too slow to be useful.
Although Grunt is a useful shell, remember that it is not a full-featured shell. It does not provide a number of commands found in standard Unix shells, such as pipes, redirection, and background execution.
To exit Grunt you can type quit
or enter Ctrl-D.
One of the main uses of Grunt is to enter Pig Latin in an interactive session. This can be particularly useful for quickly sampling your data and for prototyping new Pig Latin scripts.
You can enter Pig Latin directly into Grunt. Pig
will not start executing the Pig Latin you enter until it sees either a
store
or dump
. However, it will do basic syntax and semantic checking to help you catch errors
quickly. If you do make a mistake while entering a line of Pig Latin in
Grunt, you can reenter the line using the same alias, and Pig will take
the last instance of the line you enter. For example:
pig -x local grunt> dividends = load 'NYSE_dividends' as (exchange, symbol, date, dividend); grunt> symbols = foreach dividends generate symbl; ...Error during parsing. Invalid alias: symbl ... grunt> symbols = foreach A generate symbol; ...
Besides entering Pig Latin interactively, Grunt’s other
major use is to act as a shell for HDFS. In versions 0.5 and later of Pig, all hadoop fs
shell commands are available.
They are accessed using the keyword fs
. The dash
(-
) used in the hadoop fs
is also
required:
grunt>fs -ls
You can see a complete guide to the available
commands at http://hadoop.apache.org/common/docs/r0.20.2/hdfs_shell.html.
A number of the commands come directly from Unix shells and will operate
in ways that are familiar: chgrp
,
chmod
, chown
, cp
,
du
, ls
, mkdir
,
mv
, rm
, and stat
.
A few of them either look like Unix commands you are used to but behave
slightly differently or are unfamiliar, including:
cat
filename
Print the contents of a file to
stdout
. You can apply this command to a directory and it will apply itself in turn to each file in the directory.copyFromLocal
localfile
hdfsfile
Copy a file from your local disk to HDFS. This is done serially, not in parallel.
copyToLocal
hdfsfile
localfile
Copy a file from HDFS to your local disk. This is done serially, not in parallel.
rmr
filename
Remove files recursively. This is equivalent to
rm -r
in Unix. Use this with caution.
In versions of Pig before 0.5, hadoop
fs
commands were not available. Instead, Grunt had its own
implementation of some of these commands: cat
,
cd
, copyFromLocal
,
copyToLocal
, cp
,
ls
, mkdir
, mv
,
pwd
, rm
(which acted like Hadoop’s
rmr
, not Hadoop’s rm
), and
rmf
. As of Pig 0.8, all of these commands are still
available. However, with the exception of cd
and
pwd
, these commands are deprecated in favor of using
hadoop fs
, and they might be removed at some point in
the future.
In version 0.8, a new command was added to Grunt:
sh
. This command gives you access to the local shell,
just as fs
gives you access to HDFS. Simple shell
commands that do not involve pipes or redirects can be executed. It is
better to work with absolute paths, as sh
does not
always properly track the current working directory.
Grunt also provides commands for controlling Pig and MapReduce:
kill
jobid
Kill the MapReduce job associated with
jobid
. The output of thepig
command that spawned the job will list the ID of each job it spawns. You can also find the job’s ID by looking at Hadoop’s JobTracker GUI, which lists all jobs currently running on the cluster. Note that this command kills a particular MapReduce job. If your Pig job contains other MapReduce jobs that do not depend on the killed MapReduce job, these jobs will still continue. If you want to kill all of the MapReduce jobs associated with a particular Pig job, it is best to terminate the process running Pig, and then use this command to kill any MapReduce jobs that are still running. Make sure to terminate the Pig process with a Ctrl-C or a Unixkill
, not a Unixkill -9
. The latter does not give Pig the chance to clean up temporary files it is using, which can leave garbage in your cluster.exec [[-param
param_name
=param_value
]] [[-param_filefilename
]]script
Execute the Pig Latin script
script
. Aliases defined inscript
are not imported into Grunt. This command is useful for testing your Pig Latin scripts while inside a Grunt session. For information on the-param
and-param_file
options, see Parameter Substitution.run [[-param
param_name
=param_value
]] [[-param_filefilename
]]script
Execute the Pig Latin script
script
in the current Grunt shell. Thus all aliases referenced inscript
are available to Grunt, and the commands inscript
are accessible via the shell history. This is another option for testing Pig Latin scripts while inside a Grunt session. For information on the-param
and-param_file
options, see Parameter Substitution.
[4] According to Ben Reed, one of the researchers at Yahoo! who helped start Pig, they named the shell “Grunt” because they felt the initial implementation was so limited that it was not worthy even of the name “oink.”
Get Programming Pig now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.