Chapter 7. Data

It’s often useful to include data in a package. If the primary purpose of a package is to distribute useful functions, example datasets make it easier to write excellent documentation. These datasets can be handcrafted to provide compelling use cases for the functions in the package. Here are some examples of this type of package data:

tidyr

billboard (song rankings), who (tuberculosis data from the World Health Organization)

dplyr

starwars (Star Wars characters), storms (storm tracks)

At the other extreme, some packages exist solely for the purpose of distributing data, along with its documentation. These are sometimes called “data packages.” A data package can be a nice way to share example data across multiple packages. It is also a useful technique for getting relatively large, static files out of a more function-oriented package, which might require more frequent updates. Here are some examples of data packages:

Finally, many packages benefit from having internal data that is used for internal purposes, but that is not directly exposed to the users of the package.

In this chapter we describe useful mechanisms for including data in your package. The practical details differ depending on who needs access to the data, how often it changes, and what they will do with it:

  • If you want to store R objects and make them available to the user, put them in data/. This is the best place to put example datasets. All the concrete examples ...

Get R Packages, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.