This chapter sets the groundwork for the other chapters. It explains how to download, install, and run R.
More importantly, it also explains how to get answers to your questions. The R community provides a wealth of documentation and help. You are not alone. Here are some common sources of help:
- Local, installed documentation
When you install R on your computer, a mass of documentation is also installed. You can browse the local documentation (Recipe 1.6) and search it (Recipe 1.8). I am amazed how often I search the Web for an answer only to discover it was already available in the installed documentation.
- Task views
A task view describes packages that are specific to one area of statistical work, such as econometrics, medical imaging, psychometrics, or spatial statistics. Each task view is written and maintained by an expert in the field. There are 28 such task views, so there is likely to be one or more for your areas of interest. I recommend that every beginner find and read at least one task view in order to gain a sense of R’s possibilities (Recipe 1.11).
- Package documentation
Most packages include useful documentation. Many also include overviews and tutorials, called vignettes in the R community. The documentation is kept with the packages in package repositories, such as CRAN, and it is automatically installed on your machine when you install a package.
- Mailing lists
Volunteers have generously donated many hours of time to answer beginners’ questions that are posted to the R mailing lists. The lists are archived, so you can search the archives for answers to your questions (Recipe 1.12).
- Question and answer (Q&A) websites
On a Q&A site, anyone can post a question, and knowledgeable people can respond. Readers vote on the answers, so the best answers tend to emerge over time. All this information is tagged and archived for searching. These sites are a cross between a mailing list and a social network; the Stack Overflow site is a good example.
- The Web
The Web is loaded with information about R, and there are R-specific tools for searching it (Recipe 1.10). The Web is a moving target, so be on the lookout for new, improved ways to organize and search information regarding R.
Windows and OS X users can download R from CRAN, the Comprehensive R Archive Network. Linux and Unix users can install R packages using their package management tool:
- Windows
Open http://www.r-project.org/ in your browser.
Click on “CRAN”. You’ll see a list of mirror sites, organized by country.
Select a site near you.
Click on “Windows” under “Download and Install R”.
Click on “base”.
Click on the link for downloading the latest version of R (an .exe file).
When the download completes, double-click on the .exe file and answer the usual questions.
- OS X
Open http://www.r-project.org/ in your browser.
Click on “CRAN”. You’ll see a list of mirror sites, organized by country.
Select a site near you.
Click on “MacOS X”.
Click on the .pkg file for the latest version of R, under “Files:”, to download it.
When the download completes, double-click on the .pkg file and answer the usual questions.
- Linux or Unix
The major Linux distributions have packages for installing R. Here are some examples:
Distribution Package name Ubuntu or Debian r-base Red Hat or Fedora R.i386 Suse R-base Use the system’s package manager to download and install the package. Normally, you will need the root password or
sudo
privileges; otherwise, ask a system administrator to perform the installation.
Installing R on Windows or OS X is straightforward because there are prebuilt binaries for those platforms. You need only follow the preceding instructions. The CRAN Web pages also contain links to installation-related resources, such as frequently asked questions (FAQs) and tips for special situations (“How do I install R when using Windows Vista?”) that you may find useful.
Theoretically, you can install R on Linux or Unix in one of two ways: by installing a distribution package or by building it from scratch. In practice, installing a package is the preferred route. The distribution packages greatly streamline both the initial installation and subsequent updates.
On Ubuntu or Debian, use apt-get
to download and
install R. Run under sudo
to have the necessary
privileges:
$ sudo apt-get install r-base
On Red Hat or Fedora, use yum
:
$ sudo yum install R.i386
Most platforms also have graphical package managers, which you might find more convenient.
Beyond the base packages, I recommend installing the documentation packages, too. On my Ubuntu machine, for
example, I installed r-base-html
(because I like
browsing the hyperlinked documentation) as well as
r-doc-html
, which installs the important R manuals
locally:
$ sudo apt-get install r-base-html r-doc-html
Some Linux repositories also include prebuilt copies of R packages available on CRAN. I don’t use them because I’d rather get my software directly from CRAN itself, which usually has the freshest versions.
In rare cases, you may need to build R from scratch. You might have an obscure, unsupported version of Unix; or you might have special considerations regarding performance or configuration. The build procedure on Linux or Unix is quite standard. Download the tarball from the home page of your CRAN mirror; it’s called something like R-2.12.1.tar.gz, except the “2.12.1” will be replaced by the latest version. Unpack the tarball, look for a file called INSTALL, and follow the directions.
R in a Nutshell (O’Reilly) contains more details of downloading and installing R, including instructions for building the Windows and OS X versions. Perhaps the ultimate guide is the one entitled R Installation and Administration, available on CRAN, which describes building and installing R on a variety of platforms.
This recipe is about installing the base package. See Recipe 3.9 for installing add-on packages from CRAN.
- Windows
Click on Start → All Programs → R; or double-click on the R icon on your desktop (assuming the installer created an icon for you).
- OS X
Either click on the icon in the Applications directory or put the R icon on the dock and click on the icon there. Alternatively, you can just type
R
on a Unix command line in a shell.- Linux or Unix
Start the R program from the shell prompt using the
R
command (uppercase R).
When you start R, it opens a new window. The window includes a text pane, called the R Console, where you enter R expressions (see Figure 1-1).
There is an odd thing about the Windows Start menu for R. Every time you upgrade to a new version of R, the Start menu expands to contain the new version while keeping all the previously installed versions. So if you’ve upgraded, you may face several choices such as “R 2.8.1”, “R 2.9.1”, “R 2.10.1”, and so forth. Pick the newest one. (You might also consider uninstalling the older versions to reduce the clutter.)
Using the Start menu is cumbersome, so I suggest starting R in one of two other ways: by creating a desktop shortcut or by double-clicking on your .RData file.
The installer may have created a desktop icon. If not, creating a shortcut is easy: follow the Start menu to the R program, but instead of left-clicking to run R, press and hold your mouse’s right button on the program name, drag the program name to your desktop, and release the mouse button. Windows will ask if you want to Copy Here or Move Here. Select Copy Here, and the shortcut will appear on your desktop.
Another way to start R is by double-clicking on a
.RData file in your working directory. This is the
file that R creates to save your workspace. The first time you create a
directory, start R and change to that directory. Save your workspace
there, either by exiting or using the save.image
function. That will create
the .RData file. Thereafter, you can simply open
the directory in Windows Explorer and then double-click on the
.RData file to start R.
Perhaps the most baffling aspect of starting R on Windows is embodied in a simple question: When R starts, what is the working directory? The answer, of course, is that “it depends”:
If you start R from the Start menu, the working directory is normally either C:\Documents and Settings\<username>\My Documents (Windows XP) or C:\Users\<username>\Documents (Windows Vista, Windows 7). You can override this default by setting the
R_USER
environment variable to an alternative directory path.If you start R from a desktop shortcut, you can specify an alternative startup directory that becomes the working directory when R is started. To specify the alternative directory, right-click on the shortcut, select Properties, enter the directory path in the box labeled “Start in”, and click OK.
Starting R by double-clicking on your .RData file is the most straightforward solution to this little problem. R will automatically change its working directory to be the file’s directory, which is usually what you want.
In any event, you can always use the getwd
function to discover your current working directory (Recipe 3.1).
Just for the record, Windows also has a console version of R called Rterm.exe. You’ll find it in the bin subdirectory of your R installation. It is much less convenient than the graphic user interface (GUI) version, and I never use it. I recommend it only for batch (noninteractive) usage such as running jobs from the Windows scheduler. In this book, I assume you are running the GUI version of R, not the console version.
Run R by clicking the R icon in the Applications folder. (If you use R frequently, you can drag it from the folder to the dock.) That will run the GUI version, which is somewhat more convenient than the console version. The GUI version displays your working directory, which is initially your home directory.
OS X also lets you run the console version of R by typing
R
at the shell prompt.
Start the console version of R from the Unix shell prompt
simply by typing R
, the name of the program. Be
careful to type an uppercase R
, not a lowercase
r
.
The R program has a bewildering number of command line options.
Use the --help
option to see the complete
list.
See Recipe 1.4 for exiting from R, Recipe 3.1 for more about the current working directory, Recipe 3.2 for more about saving your workspace, and Recipe 3.11 for suppressing the start-up message. See Chapter 2 of R in a Nutshell.
Simply enter expressions at the command prompt. R will evaluate them and print (display) the result. You can use command-line editing to facilitate typing.
R prompts you with “>
”. To get
started, just treat R like a big calculator: enter an expression, and R will evaluate the
expression and print the result:
> 1+1
[1] 2
The computer adds one and one, giving two, and displays the result.
The [1]
before the 2
might be confusing. To R, the result is a vector, even though it has
only one element. R labels the value with [1]
to
signify that this is the first element of the vector...which is not
surprising, since it’s the only element of the
vector.
R will prompt you for input until you type a complete expression.
The expression max(1,3,5)
is a complete expression,
so R stops reading input and evaluates what it’s got:
> max(1,3,5)
[1] 5
In contrast, “max(1,3,
” is an incomplete
expression, so R prompts you for more input. The prompt changes from
greater-than (>
) to plus (+
),
letting you know that R expects more:
>max(1,3,
+5)
[1] 5
It’s easy to mistype commands, and retyping them is tedious and frustrating. So R includes command-line editing to make life easier. It defines single keystrokes that let you easily recall, correct, and reexecute your commands. My own typical command-line interaction goes like this:
I enter an R expression with a typo.
R complains about my mistake.
I press the up-arrow key to recall my mistaken line.
I use the left and right arrow keys to move the cursor back to the error.
I use the Delete key to delete the offending characters.
I type the corrected characters, which inserts them into the command line.
I press Enter to reexecute the corrected command.
That’s just the basics. R supports the usual keystrokes for recalling and editing command lines, as listed in Table 1-1.
Table 1-1. Keystrokes for command-line editing
On Windows and OS X, you can also use the mouse to highlight commands and then use the usual copy and paste commands to paste text into a new command line.
See Recipe 2.13. From the Windows main menu, follow Help → Console for a complete list of keystrokes useful for command-line editing.
On all platforms, you can also use the q
function (as in
quit) to terminate the program.
> q()
Note the empty parentheses, which are necessary to call the function.
Whenever you exit, R asks if you want to save your workspace. You have three choices:
Save your workspace and exit.
Don’t save your workspace, but exit anyway.
Cancel, returning to the command prompt rather than exiting.
If you save your workspace, then R writes it to a file called
.RData
in the current working
directory. This will overwrite the previously saved workspace, if any,
so don’t save if you don’t like the changes to your workspace (e.g., if
you have accidentally erased critical data).
See Recipe 3.1 for more about the current working directory and Recipe 3.2 for more about saving your workspace. See Chapter 2 of R in a Nutshell.
You want to interrupt a long-running computation and return to the command prompt without exiting R.
Interrupting R can leave your variables in an indeterminate state, depending upon how far the computation had progressed. Check your workspace after interrupting.
See Recipe 1.4.
Use the help.start
function to see the
documentation’s table of contents:
> help.start()
From there, links are available to all the installed documentation.
The base distribution of R includes a wealth of documentation—literally thousands of pages. When you install additional packages, those packages contain documentation that is also installed on your machine.
It is easy to browse this documentation via the
help.start
function, which opens a window on the
top-level table of contents; see Figure 1-2.
The two links in the Reference section are especially useful:
- Packages
Click here to see a list of all the installed packages, both in the base packages and the additional, installed packages. Click on a package name to see a list of its functions and datasets.
- Search Engine & Keywords
Click here to access a simple search engine, which allows you to search the documentation by keyword or phrase. There is also a list of common keywords, organized by topic; click one to see the associated pages.
The local documentation is copied from the R Project website, which may have updated documents.
Use help
to display the documentation for the
function:
> help(functionname
)
Use args
for a quick reminder of the function
arguments:
> args(functionname
)
Use example
to see examples of using the
function:
> example(functionname
)
I present many R functions in this book. Every R function has more bells and whistles than I can possibly describe. If a function catches your interest, I strongly suggest reading the help page for that function. One of its bells or whistles might be very useful to you.
Suppose you want to know more about the mean
function. Use the help
function like this:
> help(mean)
This will either open a window with function documentation or
display the documentation on your console, depending upon your platform.
A shortcut for the help
command is to simply type
?
followed by the function name:
> ?mean
Sometimes you just want a quick reminder of the arguments to a function: What are they, and in what order
do they occur? Use the args
function:
>args(mean)
function (x, ...) NULL >args(sd)
function (x, na.rm = FALSE) NULL
The first line of output from args
is a
synopsis of the function call. For mean
, the synopsis
shows one argument, x
, which is a vector of numbers.
For sd
, the synopsis shows the same vector,
x
, and an optional argument called
na.rm
. (You can ignore the second line of output,
which is often just NULL
.)
Most documentation for functions includes examples near
the end. A cool feature of R is that you can request that it execute the
examples, giving you a little demonstration of the function’s
capabilities. The documentation for the mean
function, for instance, contains examples, but you don’t need to type
them yourself. Just use the example
function to watch them
run:
> example(mean)
mean> x <- c(0:10, 50)
mean> xm <- mean(x)
mean> c(xm, mean(x, trim = 0.1))
[1] 8.75 5.50
mean> mean(USArrests, trim = 0.2)
Murder Assault UrbanPop Rape
7.42 167.60 66.20 20.16
The user typed example(mean)
. Everything else
was produced by R, which executed the examples from the help page and
displayed the results.
See Recipe 1.8 for searching for functions and Recipe 3.5 for more about the search path.
You want to know more about a function that is installed
on your machine, but the help
function reports that
it cannot find documentation for any such function.
Alternatively, you want to search the installed documentation for a keyword.
Use help.search
to search the R
documentation on your computer:
> help.search("pattern
")
A typical pattern
is a function name or
keyword. Notice that it must be enclosed in quotation marks.
For your convenience, you can also invoke a search by using two question marks (in which case the quotes are not required):
> ??pattern
You may occasionally request help on a function only to be told R knows nothing about it:
> help(adf.test)
No documentation for 'adf.test' in specified packages and libraries:
you could try 'help.search("adf.test")'
This can be frustrating if you know the function is installed on your machine. Here the problem is that the function’s package is not currently loaded, and you don’t know which package contains the function. It’s a kind of catch-22 (the error message indicates the package is not currently in your search path, so R cannot find the help file; see Recipe 3.5 for more details).
The solution is to search all your installed packages for the function. Just
use the help.search
function, as suggested in the
error message:
> help.search("adf.test")
The search will produce a listing of all packages that contain the function:
Help files with alias or concept or title matching 'adf.test' using regular expression matching: tseries::adf.test Augmented Dickey-Fuller Test Type '?PKG::FOO' to inspect entry 'PKG::FOO TITLE'.
The following output, for example, indicates that the
tseries
package contains the
adf.test
function. You can see its documentation by
explicitly telling help
which package contains the
function:
> help(adf.test, package="tseries")
Alternatively, you can insert the tseries
package into your search list and repeat the original help
command, which will then find the function and display the documentation.
You can broaden your search by using keywords. R will then find any installed documentation that contains the keywords. Suppose you want to find all functions that mention the Augmented Dickey–Fuller (ADF) test. You could search on a likely pattern:
> help.search("dickey-fuller")
On my machine, the result looks like this because I’ve installed
two additional packages (fUnitRoots
and
urca
) that implement the ADF test:
Help files with alias or concept or title matching 'dickey-fuller' using fuzzy matching: fUnitRoots::DickeyFullerPValues Dickey-Fuller p Values tseries::adf.test Augmented Dickey-Fuller Test urca::ur.df Augmented-Dickey-Fuller Unit Root Test Type '?PKG::FOO' to inspect entry 'PKG::FOO TITLE'.
You can also access the local search engine through the documentation browser; see Recipe 1.6 for how this is done. See Recipe 3.5 for more about the search path and Recipe 4.4 for getting help on functions.
Use the help
function and specify a package
name (without a function name):
> help(package="packagename
")
Sometimes you want to know the contents of a package (the functions and datasets). This is especially true after you download and install a new package, for example. The help function can provide the contents plus other information once you specify the package name.
This call to help will display the information for the
tseries
package, a standard package in the base
distribution:
> help(package="tseries")
The information begins with a description and continues with an index of functions and datasets. On my machine, the first few lines look like this:
Information on package 'tseries'
Description:
Package: tseries
Version: 0.10-22
Date: 2009-11-22
Title: Time series analysis and computational finance
Author: Compiled by Adrian Trapletti
<a.trapletti@swissonline.ch>
Maintainer: Kurt Hornik <Kurt.Hornik@R-project.org>
Description: Package for time series analysis and computational
finance
Depends: R (>= 2.4.0), quadprog, stats, zoo
Suggests: its
Imports: graphics, stats, utils
License: GPL-2
Packaged: 2009-11-22 19:03:45 UTC; hornik
Repository: CRAN
Date/Publication: 2009-11-22 19:06:50
Built: R 2.10.0; i386-pc-mingw32; 2009-12-01 19:32:47 UTC;
windows
Index:
NelPlo Nelson-Plosser Macroeconomic Time Series
USeconomic U.S. Economic Variables
adf.test Augmented Dickey-Fuller Test
arma Fit ARMA Models to Time Series
.
. (etc.)
.
Some packages also include vignettes, which are additional documents such as introductions, tutorials, or reference cards. They are installed on your computer as part of the package documentation when you install the package. The help page for a package includes a list of its vignettes near the bottom.
You can see a list of all vignettes on your computer by using the
vignette
function:
> vignette()
You can see the vignettes for a particular package by including its name:
> vignette(package="packagename
")
Each vignette has a name, which you use to view the vignette:
> vignette("vignettename
")
See Recipe 1.7 for getting help on a particular function in a package.
Inside R, use the RSiteSearch
function to search by
keyword or phrase:
> RSiteSearch("key phrase
")
Inside your browser, try using these sites for searching:
- http://rseek.org
This is a Google custom search that is focused on R-specific websites.
- http://stackoverflow.com/
Stack Overflow is a searchable Q&A site oriented toward programming issues such as data structures, coding, and graphics.
- http://stats.stackexchange.com/
The Statistical Analysis area on Stack Exchange is also a searchable Q&A site, but it is oriented more toward statistics than programming.
The RSiteSearch
function will open a browser
window and direct it to the search engine on the R Project website. There you
will see an initial search that you can refine. For example, this call
would start a search for “canonical correlation”:
> RSiteSearch("canonical correlation")
This is quite handy for doing quick web searches without leaving R. However, the search scope is limited to R documentation and the mailing-list archives.
The rseek.org site provides a wider search. Its virtue is that it harnesses the power of the Google search engine while focusing on sites relevant to R. That eliminates the extraneous results of a generic Google search. The beauty of rseek.org is that it organizes the results in a useful way.
Figure 1-3 shows the results of visiting rseek.org and searching for “canonical correlation”. The left side of the page shows general results for search R sites. The right side is a tabbed display that organizes the search results into several categories:
Introductions
Task Views
Support Lists
Functions
Books
Blogs
Related Tools
If you click on the Introductions tab, for example, you’ll find tutorial material. The Task Views tab will show any Task View that mentions your search term. Likewise, clicking on Functions will show links to relevant R functions. This is a good way to zero in on search results.
Stack Overflow is a so-called Q&A site, which means that anyone can submit a question and experienced users will supply answers—often there are multiple answers to each question. Readers vote on the answers, so good answers tend to rise to the top. This creates a rich database of Q&A dialogs, which you can search. Stack Overflow is strongly problem oriented, and the topics lean toward the programming side of R.
Stack Overflow hosts questions for many programming languages; therefore, when entering a term into their search box, prefix it with “[r]” to focus the search on questions tagged for R. For example, searching via “[r] standard error” will select only the questions tagged for R and will avoid the Python and C++ questions.
Stack Exchange (not Overflow) has a Q&A area for Statistical Analysis. The area is more focused on statistics than programming, so use this site when seeking answers that are more concerned with statistics in general and less with R in particular.
If your search reveals a useful package, use Recipe 3.9 to install it on your machine.
Visit the list of task views at http://cran.r-project.org/web/views/. Find and read the task view for your area, which will give you links to and descriptions of relevant packages. Or visit http://rseek.org, search by keyword, click on the Task Views tab, and select an applicable task view.
Visit crantastic and search for packages by keyword.
To find relevant functions, visit http://rseek.org, search by name or keyword, and click on the Functions tab.
This problem is especially vexing for beginners. You think R can solve your problems, but you have no idea which packages and functions would be useful. A common question on the mailing lists is: “Is there a package to solve problem X?” That is the silent scream of someone drowning in R.
As of this writing, there are more than 2,000 packages available for free download from CRAN. Each package has a summary page with a short description and links to the package documentation. Once you’ve located a potentially interesting package, you would typically click on the “Reference manual” link to view the PDF documentation with full details. (The summary page also contains download links for installing the package, but you’ll rarely install the package that way; see Recipe 3.9.)
Sometimes you simply have a generic interest—such as Bayesian analysis, econometrics, optimization, or graphics. CRAN contains a set of task view pages describing packages that may be useful. A task view is a great place to start since you get an overview of what’s available. You can see the list of task view pages at http://cran.r-project.org/web/views/ or search for them as described in the Solution.
Suppose you happen to know the name of a useful package—say, by seeing it mentioned online. A complete, alphabetical list of packages is available at http://cran.r-project.org/web/packages/ with links to the package summary pages.
You can download and install an R package called sos
that provides powerful other ways
to search for packages; see the vignette at http://cran.r-project.org/web/packages/sos/vignettes/sos.pdf.
You have a question, and you want to search the archives of the mailing lists to see whether your question was answered previously.
Open http://rseek.org in your browser. Search for a keyword or other search term from your question. When the search results appear, click on the “Support Lists” tab.
You can perform a search within R itself. Use the
RSiteSearch
function to initiate a search:>
RSiteSearch("
keyphrase
")The initial search results will appear in a browser. Under “Target”, select the R-help sources, clear the other sources, and resubmit your query.
This recipe is really just an application of Recipe 1.10. But it’s an important application because you should search the mailing list archives before submitting a new question to the list. Your question has probably been answered before.
CRAN has a list of additional resources for searching the Web; see http://cran.r-project.org/search.html.
The Mailing Lists page contains general information and instructions for using the R-help mailing list. Here is the general process:
Subscribe to the R-help list at the Main R Mailing List.
Read the Posting Guide for instructions on writing an effective submission.
Write your question carefully and correctly. If appropriate, include a minimal self-reproducing example so that others can reproduce your error or problem.
Mail your question to r-help@r-project.org.
The R mailing list is a powerful resource, but please treat it as a last resort. Read the help pages, read the documentation, search the help list archives, and search the Web. It is most likely that your question has already been answered. Don’t kid yourself: very few questions are unique.
After writing your question, submitting it is easy. Just mail it to r-help@r-project.org. You must be a list subscriber, however; otherwise your email submission may be rejected.
Your question might arise because your R code is causing an error or giving unexpected results. In that case, a critical element of your question is the minimal self-contained example:
- Minimal
Construct the smallest snippet of R code that displays your problem. Remove everything that is irrelevant.
- Self-contained
Include the data necessary to exactly reproduce the error. If the list readers can’t reproduce it, they can’t diagnose it. For complicated data structures, use the
dump
function to create an ASCII representation of your data and include it in your message.
Including an example clarifies your question and greatly increases the probability of getting a useful answer.
There are actually several mailing lists. R-help is the main list for general questions. There are also many special interest group (SIG) mailing lists dedicated to particular domains such as genetics, finance, R development, and even R jobs. You can see the full list at https://stat.ethz.ch/mailman/listinfo. If your question is specific to one such domain, you’ll get a better answer by selecting the appropriate list. As with R-help, however, carefully search the SIG list archives before submitting your question.
An excellent essay by Eric Raymond and Rick Moen is entitled “How to Ask Questions the Smart Way”. I suggest that you read it before submitting any question.
Get R Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.