Chapter 9. External Data
It’s often useful to include data in a package. If you’re releasing the package to a broad audience, it’s a way to provide compelling use cases for the package’s functions. If you’re releasing the package to a more specific audience, interested either in the data (e.g., NZ census data) or the subject (e.g., demography), it’s a way to distribute that data along with its documentation (as long as your audience is R users).
There are three main ways to include data in your package, depending on what you want to do with it and who should be able to use it:
-
If you want to store binary data and make it available to the user, put it in data/. This is the best place to put example datasets.
-
If you want to store parsed data, but not make it available to the user, put it in R/sysdata.rda. This is the best place to put data that your functions need.
-
If you want to store raw data, put it in inst/extdata.
A simple alternative to these three options is to include it in the source of your package, either creating by hand, or using dput()
to serialize an existing dataset into R code.
Each possible location is described in more detail in the following sections.
Exported Data
The most common location for package data is (surprise!) data/. Each file in this directory should be an .RData file created by save()
containing a single object (with the same name as the file). The easiest way to adhere to these rules is to use devtools::use_data()
:
x <- sample(1000) devtools::use_data(x, ...
Get R Packages now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.